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An iterative algorithm is given for solving a system Ax=k of n linear equations in n 
unknowns. The solution is given in n steps. It is shown that this method is a special 
case of a very general method which also includes Gaussian elimination. These general 
algorithms are essentially algorithms for finding an n dimensional ellipsoid. Connections 
are made with the theory of orthogonal polynomials and continued fractions. 



1. Introduction 

One of the major problems in machine computa- 
tions is to find an effective method of solving a 
system of n simultaneous equations in n unknowns, 
particularly if n is large. There is, of course, no 
best method for all problems because the goodness 
of a method depends to some extent upon the 
particular system to be solved. In judging the 
goodness of a method for machine computations, one 
should bear in mind that criteria for a good machine 
method may be different from those for a hand 
method. By a hand method, we shall mean one 
in which a desk calculator may be used. By a 
machine method, we shall mean one in which 
sequence-controlled machines are used. 

A machine method should have the following 
properties: 

(1) The method should be simple, composed of a 
repetition of elementary routines requiring a mini- 
mum of storage space. 

(2) The method should insure rapid convergence 
if the number of steps required for the solution is 
infinite. A method which — if no rounding-off errors 
occur — will yield the solution in a finite number of 
steps is to be preferred. 

(3) The procedure should be stable with respect 
to rounding-off errors. If needed, a subroutine 
should be available to insure this stability. It 
should be possible to diminish rounding-off errors 
by a repetition of the same routine, starting with 
the previous result as the new estimate of the 
solution. 

(4) Each step should give information about the 
solution and should yield a new and better estimate 
than the previous one. 

(5) As many of the original data as possible should 
be used during each step of the routine. Special 
properties of the given linear system — such as having 
many vanishing coefficients — ^should be preserved. 
(For example, in the Gauss elimination special 
properties of this type may be destroyed.) 

In our opinion there are two methods that best fit 
these criteria, namely, (a) the Gauss elimination 

1 This work was performed on a National Bureau of Standards contract with 
the University of California at Los Angeles, and was sponsored (in part) by the 
Office of Naval Research, United States Navy. 

2 National Bureau of Standards and University of California at Los Angeles. 

3 University of California at Los Angeles, and Eidgenossische Technische 
Hochschule, Zurich, Switzerland. 



method; (b) the conjugate gradient method presented 
in the present monograph. 

There are many variations of the elimination 
method, just as there are many variations of the 
conjugate gradient method here presented. In the 
present paper it will be shown that both methods 
are special cases of a method that we call the method 
of conjugate directions. This enables one to com- 
pare the two methods from a theoretical point of 
view. 

In our opinion, the conjugate gradient method is 
superior to the elimination method as a machine 
method. Our reasons can be stated as follows : 

(a) Like the Gauss elimination method, the method 
of conjugate gradients gives the solution in n steps if 
no roimding-off error occurs. 

(b) The conjugate gradient method is simpler to 
code and requires less storage space. 

(c) The given matrix is unaltered during the proc- 
ess, so that a maximum of the original data is used. 
The advantage of having many zeros in the matrix 
is preserved. The method is, therefore, especially 
suited to handle linear systems arising from difference 
equations approximating boundary value problems. 

(d) At each step an estimate of the solution is 
given, which is an improvement over the one given in 
the preceding step. 

(e) At any step one can start anew by a very 
simple device, keeping the estimate last obtained as 
the initial estimate. 

In the present paper, the conjugate gradient rou- 
tines are developed for the symmetric and non- 
symmetric cases. The principal results are described 
in section 3. For most of the theoretical considera- 
tions, we restrict ourselves to the positive definite 
symmetric case. No generality is lost thereby. We 
deal only with real matrices. The extension to 
complex matrices is simple. 

The method of conjugate gradients was developed 
independently by E. Stiefel of the Institute of Applied 
Mathematics at Zurich and by M. R. Hestenes with 
the cooperation of J. B. Rosser, G. Forsythe, and 
L. Paige of the Institute for Numerical Anal}' sis, 
National Bureau of Standards. The present account 
was prepared jointly by M. R. Hestenes and E. 
Stiefel during the latter's stay at the National Bureau 
of Standards. The first papers on this method were 
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given by E. Stiefel ^ and by M. R. Hestenes.^ Reports 
on this method were given by E. Stiefel ^ and J. B. 
Rosser ^ at a Symposium^ on August 23-25, 1951. 
Recently, C. Lanczos ^ developed a closely related 
routine based on his earlier paper on eigenvalue 
problem.^^ Examples and numerical tests of the 
method have been by R. Hayes, U. Hochstrasser, 
and M. Stein. 

2. Notations and Terminology 

Throughout the following pages we shall be con- 
cerned with the problem of solving a system of linear 
equations 



C^21^1+«223:2+ . . . +(l2nXn=k2 



(2:1) 



^nl'^l~r<^'»2^2~r 



\Cinn'^n — ^Tf 



These equations will be written in the vector form 
Ax=k. Here A is the matrix of coefficients (a^^), 
X and k are the vectors (xi,. . .,Xn) and (ti,. . .,^n). 
It is assumed that A is nonsingular. Its inverse A~^ 
therefore exists. We denote the transpose of A by 
A*. 

Given two vectors x=(xi,. . .,Xn) and y = 
iVi,' ' ',yn)y their sum x+y is the vector 
fe + ^/b- • .,^n + yn), and ax is the vector (axi,. . .,a:r„), 
where a. is a scalar. The sum 

{:x,y)=xiyi+X2y2 + - . '+Xnyn 
is their scalar product. The length of x will be denoted 

by 

\x\^{xl+. . , + xlf={x,x)K 

The Cauchy-Schwarz inequality states that for all 

{x,yy<{x,x){y,y) or \{x,y)\<\x\\y\. (2:2) 

The matrix A and its transpose A* satisfy the 
relation 

n 

{x,Ay)^J^aijXiyj=(A'^x,y). ' 

u=i 

If aij=aji, that is, if A=A'^y then A is said to be 
symmetric. A matrix A is said to be positive definite 
in case (a;,Aa:)>0 whenever xf^O. If {x,Ajo)'^0 for 
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all X, then A is said to be nonnegative. If A is sym- 
metric, then two vectors x and y are said to be con- 
jugate or A-orthogonal if the relation {x,Ay)=^ 
{Ax,y)=0 holds. This is an extension of the ortho- 
gonality relation {x,y)=Q. 

By an eigenvalue of a matrix A is meant a number 
X such that Ay = \y has a solution 2/^0, and y is 
called a corresponding eigenvector. 

Unless otherwise expressly stated the matrix A, 
with which we are concerned, will be assumed to be 
symmetric and positive definite. Clearly no loss of 
generality is caused thereby from a theoretical point 
of view, because the system Ax=k is equivalent to 
the system Bx=l, where B=A^A, l=A'^k. From a 
numerical point of view, the two systems are differ- 
ent, because of rounding-off errors that occur in 
joining the product A* A. Our applications to the 
nonsymmetric case do not involve the computation 
of A*A. 

In the sequel we shall not have occasion to refer to 
a particular coordinate of a vector. Accordingly 
we may use subscripts to distinguish vectors instead 
of components. Thus Xq will denote the vector 
(xoi, . . ., Xon) and Xi the vector (xu, . . ., Xin). In case 
a symbol is to be interpreted as a component, we shall 
call attention to this fact unless the interpretation is 
evident from the context. 

The solution oj the system Ax=k will be denoted by 
h; that is, Ah=^k. If x is an estimate of A, the differ- 
ence r=k—Ax will be called the residual of x as an 
estimate of h. The quantity \r\^ will be called the 
sauared residual. The vector h—x will be called the 
error vector of x, as an estimate of h. 

3. Method of Conjugate Gradients (cg- 
Method) 

The present section will be devoted to a description 
of a method of solving a system of linear equations 
Ax=k. This method will be called the conjugate 
gradient method or, more briefly, the cg-method, for 
reasons which will unfold from the theory developed 
in later sections. For the moment, we shall limit 
ourselves to collecting in one place the basic formulas 
upon which the method is based and to describing 
briefly how these formulas are used. 

The cg-method is an iterative method which 
terminates in at most n steps if no rounding-off 
errors are encountered. Starting with an initial 
estimate Xq of the solution h, one determines succes- 
sively new estimates Xo, Xi, X2f . . . of A, the estimate 
Xi being closer to h than Xi+i. At each step the 
residual ri=k—Axi is computed. Normally this 
vector can be used as a measure of the ^ 'goodness'^ 
of the estimate Xi. However, this measure is not a 
reliable one because, as will be seen in section 18, 
it is possible to construct cases in which the squared 
residual |r^p increases at each step (except for the 
last) while the length of the error vector \h—Xi\ 
decreases mono tonic ally. If no rounding-off error 
is encountered, one will reach an estimate Xmi^rn^n) 
at which r^^^^O. This estimate is the desired solu- 
tion h. Normally, m=n. However, since rounding- 
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off errors always occur except under very unusual 
circumstances, the estimate Xn in general will not be 
the solution h but will be a good approximation of h. 
If the residual r„ is too large, one may continue 
with the iteration to obtain better estimates of h. 
Our experience indicates that frequently j^^+i is 
considerably better than x„. One should not con- 
tinue too far beyond Xn but should start anew 
with the last estimate obtained as the initial 
estimate, so as to diminish the effects of rounding- 
off errors. As a matter of fact one can start anew 
at any step one chooses. This flexibility is one of the 
principal advantages of the method. 

In case the matrix A is symmetric and 'positive 
dejinite, the following formulas are used in the con- 
jugate gradient method: 

Po'=^'o=k — Axo {xq arbitrary) (3:1a) 



(3:1b) 

(3:1c) 
(3:ld) 

(3:le) 
(3:lf) 



(Pi,Apiy 

Xi^i^=Xi-{-aiPi, 
ri+i = ri—aiApi, 






\ri\ 



Pi-i-i='ri^i+biPi. 



In place of the formulas (3:1b) and (3-le) one may 
use 



^ _ {Pi,ri) 
'''-{p,,Ap,y 



b,= - 



{r^^^,Ap^) 



iPi,APi) 



(3:2a) 



(3:2b) 



Although these formulas are slightly more compli- 
cated than those given in (3:1), they have the ad- 
vantage that scale factors (introduced to increase 
accuracy) are more easily changed during the course 
of the computation. 

The conjugate gradient method (cg-method) is 
given by the following steps: 

Initial step: Select an estimate Xq of h and com- 
pute the residual Tq and the direction po by formulas 
(3:1a). 

General routine: Having determined the estimate 
Xi of h, the residual r^, and the direction pt, compute 
Xi+i, r^+i, and Pi+i by formulas (3:1b), . . ., (3: If) 
successively. 

As will be seen in section 5, the residuals ro, ri, 
. . . are mutually orthogonal, and the direction vec- 
tors Po, Pi, . . . arc mutually conjugate, that is, 

(ru rj) = 0, {pu Apj)=0 {i^j). (3:3) 

These relations can be used as checks. 

Once one has obtained the set of n mutually 
conjugate vectors 2^0, • • -j pn-i tlie solution of 



Ax = k' 
can be obtained by the formula 

""-UiAp^p,)^'' 



(3:4) 



(3:5) 



It follows that, if we denote by pij the jtli component 
of Pi, then 

-^A PijP ik 
f£^ {Pi, Ap^) 

is the element in the jth row and kill column of the 
inverse ^"^ of ^. 

There are two objections to the use of formula 
(3:5). First, contrary to the procedure of the 
general routine (3:1), this would require the storage 
of the vectors Po, Pi, - - - . This is impractical, 
particularly in large systems. Second, the results 
obtained by this method are much more influenced 
by round ing-oft' errors than those obtained by the 
step-by-step routine (3:1). 

In the cg-method the error vector h—xis diminished 
in length at each step. The quantity /(j)=(/^— a*, 
A (h — x)), culled the error function, is also diminished 
at each step. But the squared residual \r\^=\k—Ax\^ 
normally oscillates and may even increase. There 
is a modification of the cg-method where all three 
quantities diminish at each step. This modification 
is given in section 7. It has an advantage and a 
disadvantage. Its disadvantage is that the error 
vector in each step is longer than in the original 
method. Moreover, the computation is complicated, 
since it is a routine superimposed upon the original 
one. However, in the special case where the given 
linear equation system arises from a difference 
approximation of a boundary-value problem, it can 
be shown that the estimates are smoother in the 
modified method than in the original. This may be 
an advantage if the desired solution is to be diflPer- 
entiated afterwards. 

Concurrently with the solution of a given linear 
system, characteristic roots of its matrix may l)e 
obtained: compute the values of the polynomials 
i?o, /?],.. . and Po, Pi, ... at X by the iteration 

Ro = Po=l 



Ei+i — Ki — \aiPi 



(3:6) 



The last polynomial 7?,,^^) is a factor of the charac- 
teristic polynomial of A and coincides with it when 
m=n. The characteristic roots, which are the zeros 
of i?m(X), can be found by Newton's methods without 
actually computing the polynomial BmW itself. 
One uses the formulas 






(3:7) 
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where Rm(Kk), RLO^k) are determined by the iteration 
(3:6) and, 

Pi^i = Ri+biPi 

with X=Xfc. In this connection, it is of interest to 
observe that if m=n, the determinant of A is given 
by the formula 



det (.1) = 



a^ai . . . dn-i 

The cg-method can be extended to the case in 
which ^ is a general nonsymmetric and nonsingular 
matrix. In this case one replaces eq (3:1) by the set 

ro=k — Axo, po=-4*ro, 

Xi+i=Xi+aiPi, 
ri+i = ri—aiA'Pi, 



(3:8) 



hi= 



|^*r.4-i 



This system is discussed in section 10. 

4. Method of Conjugate Directions (cd- 
Method)^ 

The cg-method can be considered as a special case 
of a general method, which we shall call the method 
of conjugate directions or more briefly the cd-method. 
In this method, the vectors :Poj Pi, • • • are selected 
to be mutuall}^ conjugate but have no further restric- 
tions. It consists of the following routine: 

Initial step. Select an estimate Xq of h (the solu- 
tion), compute the residual rQ=k—AxQ, and choose a 
direction 2>o- 

General routine. Having obtained the estimate 
Xi of h, the residual ri=k—Axi and the direction pi, 
compute the new estimate Xi^i and its residual r^+i 
by the formulas 



iPi,Apiy 

r^^i = r,—aiApi. 



(4:1a) 

(4:1b) 

(4:1c) 



1 This method was presented from a different point of view by Fox, Huskey, and 
Wilkinson on p. 149 of a paper entitled "Notes on the solution of algebraic linear 
simultaneous equations," Quarterly Journal of Mechanics and Applied Mathe- 
matics c. 2, 149-173 (1948). 



Next select a direction ^j+i conjugate to 'p^, 
that is, such that 



•, Ph 



{p,+,,Ap,) = Q (i=0,l. 



,i). (4:2) 



In a sense the cd-method is not precise, in that no 
formulas are given for the computation of the direc- 
tions Po, Pi, ' ' • • Various formulas can be given, 
each leading to a special method. The formula 
(3: If) leads to the cg-method. It will be seen in 
section 12 that the case in which the ^^'s are obtained 
by an ^-orthogonalization of the basic vectors 
(1,0, . . ., 0), (0,1,0, ...),... leads essentially to 
the Gauss elimination method. 

The basic properties of the cd-method are given by 
the following theorems. 

Theorem 4:1 1, The direction vectors po, pi, • • • are 
mutually conjugate. The residual vector r^ is orthogonal 
lo Po, Pi, ' • ', Pi-i' The inner product oj Pi with each 
of the residuals Tq, ri, • • -, rfis the same. That is, 



(Pi,Apj) = {i9^j) 
(Pj,ri) = (i=0,l, . . .,i-l) 
{Pi,ro) = {puri)= • • • ={p.t,ri). 
The scalar a t can be given by the formula 

(Pi^ro) 



ai= 



{Pi,Api) 



(4:3a) 
(4:3b) 
(4:3c) 

(4:4) 



in place of (4:1a). 

Equation (4:3a) follows from (4:2). Using (4:1c), 
we find that 

(Pj,^k+i) = {Pj,rk)-ak{pj,Apjc). 

1{ j=k we have, by (4:1a), {pk,rk-^i) = 0. Moreover, 
by (4:3a) (Pj,rjc^i) = lpj,rk), (jy^k). Equations (4:3b) 
and (4:3c) follow from these relations. The formula 
(4:4) follows from. (4:3c) and (4:1a). 

As a consequence of (4:4) the estimates a::i,X2, • • • 
of h can be computed without computing the resid- 
uals r^ri, • • •, provided that the choice of the 
direction vectors Po,Pi, • • • is independent of these 
residuals. 

Theorem 4:2. The cd-method is an m-step method 
(m ^ n) in the sense that at the mth step the estimate Xm 
is the desired solution h. 

For let m be the first integer such that yo=h—Xo^is 
in the subspace spanned by po, • • •, Pm-i- Clearly, 
m ^71, since the vectors ^0,^1, • • • are linearly inde- 
pendent. We may, accordingly, choose scalars 
ojo, • • •, ccrn-i such that 



Hence, 



yo=(^oPo+ 



h = Xo + aoPo + 



+ am-iPni-l' 



+ arn-lPn 



Moreover, 



ro=k—Axo=A(h—Xo) = acA2)o+ • • ' + am-iA2)n 
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Computing the inner product (2)^/0) we find by (4 :3a) 
and (4:4) that at^^ai, and hence tliat h=Xm, as was 
to be proved. 

The cd-method can be looked upon as a relaxation 
method. In order to estabhsh this result, we intro- 
duce the function 

J{x) = (h-x,A(h-x))={x,Ax)-2(x,k) + {h,k). (4:5) 

Clearly, /'(x) ^ and y(x) = if, and only if, x=h. 
The function J{x) can be used as a measure of the 
''goodness'^ of x as an estimate of h. Since it plays 
an important role in our considerations, it will be 
referred to as the error function. If j? is a direction 
vector, we have the useful relation 

f{x + ap)=f(x)-2aip,r) + a\p,Ap), (4:6) 

where r=k—Ax=A{h—x)j as one readily verifies 
by substitution. Considered as a function of a, 
the function j{x+a]i) has a minimum value at 
a=a, wdiere 

{V.Ap) ^^-^^ 

This minimum value differs from/(.r) by the quantity 

{p,ry 



f{x)-j{x + ap)^a\p,Ap)==^ 



(p,Ap) 



(4:8) 



Comparing (4:7) with (4:1a), we obtain the first two 
sentences of the following result : 

Theorem 4:3. The point Xt minimizes J (x) on the 
line x=Xi_i + api^i. At the i-th step the error f(Xi^i) 
is relaxed by the amount 



j{Xi-i) — j{Xi)-- 



_ (pi.i,ri_,y 
'{pi.i,Api_^) 



(4:9) 



In fact, the point Xi minimizes f{x) on the i-dimensional 
plane P i of points 



X = Xo+aoPo+... + ai_iPi_i, 



(4:10) 



where ao, ..., a^.i are parameters. This plane con- 
tains the points Xq,X], ..., Xi. 

In view of this result the cd-method is a method 
of relaxation of the error furction/(it;). An iteration 
of the routine may accordingly be referred to as 
a relaxation. 

In order to prove the third sentence of the theorem 
observe that at the point (4:10) 



f(x)^f{xo)-J2[2aj{pj,r,)- 
At the minimum point we have 



-a%pj,Apj)]. 



ar 



iVh^Vj) 



and hence aj=aj, by (4:4). The minimum point is 
accordingly the point Xf, as was to be proved. 



Geometrically, the equation /(x) = const, defines 
an ellipsoid of dimension n — 1. The point at which 
f{x) has a minimum is the center of the ellipsoid and 
is the solution of Ax=k. The i-dimensional plane 
Pi, described in the last theorem, cuts the ellipsoid 
f{x)=f(xo) in an ellipsoid Ei of dimension i—l, 
unless Ei is the point Xq itself. (In the cg-method, 
Ei is never degenerate, unless Xo=h.) The point Xi is 
the center of Ei. Hence we have the corollary: 

Corollary 1. The point Xi is the center of the 
(i—l) -dimensional ellipsoid in which the i-dimensional 
plane Pi intersects the (n—1) -dimensional ellipsoid 
f{x)=f{x,). 

Although the function f(x) is the fundamental 
error function which decreases at each step of the 
relaxation, one is unable to compute f{x) without 
knowing the solution h we are seeking. In order to 
obtain an estimate of the magnitude of /(x) we may 
use the follow^ing: 

Theorem 4:4. The error rector y=h—x, the residual 
r=k—Ax, and the error function fix) satisfy the 
relations 

U|2 1.^12 

J^< /(t)<-^-> (4-11) 



where y^{z) is the Rayleigh quotient 

{z,Az) 



^i(z)-- 



\z\' 



The Eayleigh quotient of the error 
exceed that of the residual r, that is, 



Moreover, 



m(?/)^m(0. 



M(r) = ''^'=M(^) 



(4:12) 
vector y does not 

(4:13) 
(4:14) 



The proof of this result is based on the Scliwarzian 
quotients 

{z, Az) ^ {Az,Az) ^ {Az,A''z ) , 

{z,z) = {z,Az) ={Az,Az)' ^ ' 

The first of these follows fi'om the inequality of 
Schwarz 

\{p,q)\'^(p,p){q,q) (4:16) 

by choosing p = z, q=Az. The second is obtained 
by selecting p = Bz, q=B^z, where B^=A. 

In order to prove theorem 4:4 recall that if we 
set y=h—x, then 

r=k—Ax=A{h—x)=Ay 

f(:x) = (y,Ay) 

by (4:5). Using the inequalities (4:15) with z=y, 
we see that 



m(?/) = 



(y,Ay) ^ {Ay,Ay )_ \r\' ^ {Ay,A'y) 
(y,y) - {y,Ay) f{x)-(Ay,Ay) 



{r,A7') 



{r,r) 



= ix{r). 
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This yields (4:11) and (4:13). Using (4:16) with 
V=y and g=r we find that 



Hence 



f{x)={yAy)={y,r)^\y\ \r\. 
J(x) = fi(y)\y\^^\y\ |r|, 



so that the second inquality in (4:14) holds. The 
first inequality is obtained from the relations 



/x(r) 



^f{x)S\y\\r\. 



As is to be expected, any cd-method has within 
its routine a determination of the inverse A~^ of A. 
We have, in fact, the following: 

Theorem 4:5. Let po, . . ., pn-i be n mutually con- 
jugate nonzero vectors and let Pij be the j-th component 
oj Pi. The element in the j-th row and k-th column of 
A~^ is given by the sum 



s 



Pijpik 



ri {pi,Api) 
This result follows from the formula 

(PiJc) 



71-1 



^ (Pi,Api) 



Pi 



for the solution h of Ax=k, obtained by selecting 

We conclude this section with the following: 
Theorem 4:6. Let Ti be the {n—i) -dimensional 
plane through Xi conjugate to the vectors Po, Pu • • •? 
Pi-i. The plane Wi contains the points Xt, Xi+i, . . . 
and intersects the {n — 1) -dimensional ellipsoid f(x) = 
f{Xi) in an ellipsoid E- oj dimension in—i—l). 
The center of El is the solution h of Ax=k. The point 
Xi+i is the midpoint of the chord d of El through Xu 
which is parallel to pi. Ln the cg-method the chord d 
is normal to El at Xi and hence is in the direction of 
the gradient of f{x) at Xi in tti. 

The last statement will be established at the end 
of section 6. The equations of the plane ttj is given 
by the system 

{Apj,X — Xi) = {) 0'=0;lr • v^— !)• 

Since p^,;Pi+i,. . . are conjugate to ;Por • .,:Pi-i, so also 
is 

Xk—Xi=aiPi-{-. . .+ak-iPk-i {kyi). 

The points Xi,XiJ^i,. . .,x„, = h are accordingly in tt^, 
and h is the center of £" • . The chord d is defined by 
the equation x=Xi-\-taiPi, where t is a parameter. As 
is easily seen, 

f{Xi+taipl) =f(x,) - (2t-f)aKPi,Api) ■ 

The second endpoint of the chord d is the point 
Xi+2aiPi at which t = 2. The midpoint corresponds 
to t=l, and hence is the point Xt^i as was to be 
proved. 



In view of theorem 4:6, it is seen that at each step 
of the cd-routine the dimensionality of the space Wi 
in which we seek the solution h is reduced by unity. 
Beginning with Xq, we select an arbitrary chord Co of 
f{x) =f(xo) through Xq and find its center. The plane 
TTi through Xi conjugate to d contains the centers of 
all chords parallel to d- In the next step we re- 
strict ourselves to tti and select an arbitrary chord 
d of f(x) =f(xi) through Xi and find its midpoint 
X2 and the plane tto in tti conjugate to d (and hence 
to d)' This process when repeated will yield the 
answer in at most n steps. In the cg-method the 
chord Ci of f{x)=f{Xi) is chosen to be the normal 
at Xi. 

5. Basic Relations in the cg-Method 

Recall that in the cg-method the following formulas 
are used 



Po — ^0 — "^ -^Xq 


(5: la) 


a- |'■^|^ 


(5:1b) 


' iPi,Apd 


Xi+i=Xi+aiPi 


(5:lc) 


ri+i=ri—atAj)i 


(5: Id) 




(5:le) 


Pi+i = ri+i + bip,. 


(5: If) 



One should verify that eq. (5:le) and (5: If) hold for 
i=0,l,2,. . . if , and only if , 



Pk=\ric 



k ^ 



k = 0,\,2,. . .. 



(5:2) 



The present section is devoted to consequences of 
these formulas. As a first result, we have 

Theorem 5:1. The residuals Tq, ri, . . . and the 
direction vectors Po, Pi, . . . generated by {5:1) satisfy 
the relations 

(r,,r,)=0 (iT^j) (5:3a) 

{Pi,Apj)=0 (iy^j) (5:3b) 

(:Pz/;)-0 (Ki), {Vi.rj) = \ri\' {i^j) (5:3c) 

{ri,Ap,) = {pi,Ap,), {riAp,) = {) {i^j,i^j+l) (5:3d) 

The residuals ro, r^, . . . are mutually orthogonal and 
the direction vectors Po, Pi, . . . are mutually conju- 
gate. 

The proof of this result will be made by induction. 
The vectors ro, po =ro and ri satisfy these relations 
since 

{rQ,ri) = {po,r{) = \ro\^~-ao(ro,Apo)=0 
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by (5:1b). Suppose that (5:8) holds for the vectors 
To, . . ., Vu and p^, . . ., Pk-i- To show that p^ can 
be adjoined to this set it is necessary to show that 






(i^k) (5:4a) 

{i<k) (5:4b) 

(i^k,i^k-l) (5:4c) 



Equation (^5 :4a) follows at once from (5:2) and 
(5 :3a) . To prove (5 :4b) we use (5 : Id) and find that 

(ri+i,Pk) = (ruPfc) ~ai{Api,pj>i. 

By (5:4a) this becomes 



\rk\^—ai{Api,pf, 



{i<k)' 



Since «<>0, eq (5:4b) hokls. In order to establish 
(5:4c), we use (5: If) to obtain 



{Pk,Ap,)=^{rj,,Ap,)+hf,_^(pj:_^,Ap;) 



-{r,,Ap,) 

{i9^k-l)- 



It follows that (5:4c) holds and hence that (5:3) 
holds for the vectors Fq, fi . . ., r^ and Po, pi, . . ., 

It remains to show that r^^+i can be adjoined to this 
set. This will be done by showing that 



{r i,r k^i)=-0 


(i^k) 


(5:5a) 


(Api,r^+i) = 


HKk) 


(5:5b) 


(Pi,rjc+i)=0 


(i^k). 


(5:5c) 


By (5: Id) we have 






(r^r^+i) = (Ti^r^) - 


-ak(ri,Apk). 





If i<ik, the terms on the right are zero and (5:5a) 
holds. If i — k, the right member is zero by (5:1b) 
and (5:3d). Using (5: Id) again we have with i-^k 

0= (r^+i/i+i) = (rk+i,ri)—ai(rjc+i,Api) -:-■ —ai(rk^i,Api) . 

Hence (5:5b) holds. The equation (5:5c) follows 
from (5:5a) and the formula (5:2) for^?j. 

As a consequence of (5:3b) we have the first two 
sentences of the following: 

Theorem 5:2. The cg-method is a cd-inethod. It h 
the special case oj the cd-method in which the pt are 
obtained by A-orthogonalization of the residual vectors 
ri. On the other hand, a cd-method in which the resid- 
uals To, r^, . . . are mutually orthogonal is essentially 
a cg-method. 

The term '^essentially'' is used to designate that 
we disregard iterations that terminate in fewer than 
n steps, unless one adds the natural assumption that 
the formula for pi in the routine depends continu- 
ously on the initial estimate Xq. T6 prove this result 



we accordingly suppose that the routine terminates 
at the n-th step. Since the r^ is orthogonal to 
r^+i we have Xi^^Xf^i and hence at 9^0. It follows that 
{Pi,^i}^Ohj (4:1a). Wemay accordingly suppose the 
vectors 2? i have been normalized so that (^t/<) = KI^- 
In view of (4:3b) and (4:3c) eq (5:3c) holds. Select 
numbers a^- such that 

n-l 

Pz=S oiijrj. 
j=o 

Taking the inner product of pt with rj it is seen by 
(5:3 c) that 

\' J\ 

Consequently, (5:2) holds and the theorem is estab- 
lished. 

Theorem 5:3. The residual rectors ro, ri, . . . and 
the direction rectors po, Pu . . . satisfy the further 
relations 



\'Pi\ 



(Pi, Pj) 

\ri\'+b^ 



\rj\'\Pi\' 
\ri\' 



(i^j) 



il7)t-il'=k,hZl-nrr2 

j=0 l^jl 



ir„Arj) = \i-j\>l 
iri,Ari) = (pi,Api) + bf_,{pi_uAPi-i) 



(5:6a) 

(t>0) 
(5:6b) 

(5:6c) 

(i>0). 
(5:6d) 



The vector r^ is shorter than pi. The vector Pi makes 
an acute angle with pj. 

The relations (5:6a) and (5:6b) follow readily from 
(5:le), (5:lf), (5:2), and (5:3). Using (5:lf) and 
(5:3d), we see that 

{ri,Arj) = (:ri,Apj)-bj_,{ri,Apj^{) = (i<i-l). 

Hence (5:6c) holds. Equation (5:6d) is a conse- 
quence of (5:lf) and (5:3b). The final statements 
are interpretations of formula (5:6b) and (5:6a). 

Theorem 5:4. The direction vectors po, pi, ... 
satisfy the relations 

Pi={l + bo)po—aQApo (5:7a) 

Pi+i^(l + bi)Pi-aiAp-bi.iPi,i (i>0). (5:7b) 

Similarly, the residuals r^, ri, . . . satisfy the relations 

ri = ro—a^ArQ (5:8a) 

r/+i=(l + 6;-i)r,-a,ylr,-&;_ir,_i, (5:8b) 



where 



bi-.=-^ 6,_i. (5:9) 
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Equation (5:7b) is obtained by eliminating ri+i 
and Vi from the equations 

ri^i=ri — aiApi 

Pi=ri + bi-iPi-i. 

Equation (5:7a) follows similarly. In order to prove 
(5:8b), eliminate Api and Apt^i from the equations 

ri^i = ri—aiApi 

Api=Ari + bi-]Api.i 

ri=ri-i—ai.iApi_i. 

Equation (5:8a) holds since po=ro. 

Theorem 5:5. The scalar s at and hi are given by the 
several formulas 



ai = 



Ifil' 



(Pi,ri) _ (Pi.ro) 



{Pi,Api) {pi,Api) {pi,Api) 



Fi+i[ 



{ri+i,Api)_ (ri+i,zlrO 



' \r,\' (Pi,Ap,) (Pi,Api) 

The scalar at satisfies the relation 



(5:10) 
(5:11) 



1 



= M(ro), m(?>0< ;J-<M(r.) {i>0), (5:12) 



where }x{z) is the Rayleigh quotient (4:12). The recip- 
rocal of at lies between the smallest and largest char- 
acteriiSti c root^ or ^4 

The formula ^(5: 10) follows from (5:1b) and (5:3c), 
while (5:11) follows from (5:le), (5: If), (5:3b), and 
(5:3d). Since 

k^• \<\Pi\^ i^i, -^^i) > (pi, ^Pi) 

by (5:6b) and (5:6d), we have 

{pi, Api) ^ {pi, Api) {rj, Art) 



\Pi 



ri 



The inequalities (5:12) accordingly hold. The last 
statement is immediate, since m(^) lies between the 
smallest and largest characteristic roots of A, 

6. Properties of the Estimates x, of h in the 
cg-Method 

Let now Xq, Xi, . . ., Xm=h he the estimates of h 
obtained by applying the cg-method. Let ro, ri, 
. . ., rm=0 be the corresponding residuals and po, 
Pi, . . ., Pm-i the direction vectors used. The pres- 
ent section will be devoted to the study of the prop- 
erties of the points r^o, ^i, . . ., x^. As a first result 
we have 



Theorem 6:1. The estimates Xq, Xi, - • -, Xm of h are 
distinct. The point Xi minimizes the error function 
f(x) = (h—x,A{h—x)) on the i-dimensional plane Pi 
passing through the points Xq, Xi, - • •, x^. In the ith 
step of the cg-method, f{x) is diminished by the amount 

f{Xi_i)—f(Xi)=ai_i\ri^i\^=fjLipi-i)\xi^i — Xi\^, (6:1) 

where ^x{z) is the Rayleigh quotient (4:12). Hence, 

/(•^i)-/(^;)=«i|'^iP+ • • • +«^,--i|r^-i|^ 
The point Xi is given by the formulas 



(6:2) 



_ , ^ _ I 4F^ /(-g;)-/(^i) 

Xi Xo-]- > , ajPj Xq-f- ? , i \7i -Tj. 

i=o ;=o \rj\ 



(6:3) 



This result is essentially a restatement of theorem 
4:3. The formula (6:3) follows from (5:2) and 
(6:2). The second equation in (6:1) is readily 
verified. 

Theorem 6:2. Let Si be the convex closure of the 
estimates Xq, Xi, • • •, Xi. The point Xf is the point in 
Si, whose error vector h—x is the shortest. 

For a point x 9^X1 in Si is expressible in the form 

X=aQXo+' ' ' + aiXi, 
where ai ^0, a^ + ai-]-' • - + ^,==1. 
We have accordingly 

Xi — X=aQ{x^ — XQ) + ' • ' + OLi^i{Xi — Xi.i)=fiQP(i 

+- • • • +fii~lPi-U 

where the /3's are nonnegative. Inasmuch as all 

(PjjPk)y'0 it follows that 

{Xj—Xi, Xi—x)yO {i<j). 
Using the relation 

\xj—x\'=\xj—x^+2{Xj—Xi, Xi-x) + \Xi-x\'^ (6:4) 
we find that 

\Xj~-Xi\<C\Xj — x\ (i<j)- 

Setting ^'==m, we obtain theorem 6:2. 
Incidentally, we have established the 
Corollary. The point Xi is the point in St nearest to 

the point Xj (j^i) . 

Theorem 6:3. At each step of the cg-algorithm the 

error vector yi=h—Xi is reduced in length. In fact, 



L, !2_|,, |2_/(M±i(^iz±). 

\yi-i\ —WA — 



f^(Pi-i) 

where ^x{z) is the Rayleigh quotient (4:12). 



(6:5) 
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In order to establish (6:5) observe that, by (5:6a), 
{yi,Xi — Xi^i) = {xm — Xi,Pi.i)ai^i 

= [(li(Pi,Pi-i) + . . .+a,n-l{Pm-l,Pi-l)](li-l 

111 view of (6:2) and (5:1b) this becomes 

fix,) 



\ri-i\ 



{yi,Xi — Xi_i)-- 



M (^z-i) 



(6:6) 



Setting x=Xi^i and j=m in (6:4), we obtain (6:5) 
by the use of (6:6) and (6:1). 

This result estabhshes the cg-method as a method 
of successive approximations and justifies the pro- 
cedure of stopping the algorithm before the final- 
step is reached. If this is done, the estimate ob- 
tained can be improved by using the results given 
in the next two theorems. 

Theorein 6:4. Let a:/+i- * • •? ^i'^ ^^ ^^^^ projec- 
tions of the points Xi^i, • • • , Xm=h in the i- 
dimensional plane P, passing through the points Xq, 
a^i, • • • , Xi. The points Xj_i, Xu ^,-+i, • • •, ^m^ lie 
on a straight line in the order given by their enumeration. 
The point xjf^ (/c^i) is given by the jormulas 



xi; =jri_^- 



f{Xi.i)-f{Xk) 



fix,.,)- fix,) 
fix,)-fix^ 



iXi—Xi_ 



Fi-U 



Pi 



(6:7a) 
(6:7b) 



In order to prove this result, it is sufficient to 
establish (6:7). To this end observe first that the 
vector 



Pj-i 



Pi 



ij^i) 



is orthogonal to eacli of the vectors ^o, j^i, * • •> 
Pi^i. This can be seen by formmg the inner product 
with ^z (/<?'), and using (5:6a). The result is 



iVhPj)- 



r,- 



iPiVPi-i)-- 



\Pi\ 



\r,-i\^ ^-'- \ri\^ - - ' - 

The projection of the point 

Xk=x,.i + a,.,Pi.i + aiPi + . . . + «fc-iZ>A:-i 
in Pi is accordingly 



:0 



--Xi-i- 



a^•_l|/^^_^|^+. ■ .+ak.i\r^. 



Pi-i- 



Using (6:2), we obtain (6:7). The points lie in the 
designated order, since JiXk)^J(xjc+i) . 
^\ncef(:Xfn) = 0, we have the first part of 



Theorem 6:5. The point 

X. 






\ri-i\ 



is the point in Pi whose distance from the solution h is 
the least. It lies on the line Xi^iX, beyond Xt. Moreover, 



1 



1 



and 



\h — X: 



fix^') fix,) fix,.,) 

U2 ^fix^')-fiXi) 



\h- 



M ipi-l) 



(6.9) 



(6.10) 



In order to establish (6:9) and 6:10) we use the 
formula 

f(x.,+ api.i)=f(Xi)i-a'{pi.uApi-i), 

which holds for all values of a in view of the fact that 
x, minimizes fix) on P,. Setting a=fiXi)/\r,.i\'^ 
we have 



fix:^')=fix,)+ 



fix.)' 



fix,)- 



fiXiY 



«^z-l|/'z-ir fiXi-ll — fix,) 

An algebraic reduction yields (6:9). Inasmuch as 



\h- 



-Xi\ 



fiXi)'\Pi-l\ 

\ri-i\' 

fiXi)' 



ai-i\Pi-i\ 



fiXi.,)-fix,) \r,_,\' 

we obtain (6:10) from (6:9) and (5:1b). 

As a further result we have 

Theorem 6:6. Let x[, . . ., a^l_i be the projections 
of the points Xi, . . ., x^_i on the line joining the initial 
jwint Xq to the solution Xjri=h. The points Xq, x[, . . ., 
x'm-i, Xm = h lie in the order of enumeration. 

Thus, it is seen that we ])roceed towards the solu- 
tion withouc oscillation. To prove this fact we need 
only observe that 

iXm X(),Xi Xi_i)^= [X m /q^ ^ i - 1 ? 2^ J - 1) 
m-l 

= «/-! S aj{pj,p,.,)yO 

by (5 : 6a) . A similar result holds for the line joiniiag 
Xitoxjii<Cj). 

Let TT, be the (n — i)-dimeiisional plane through x, 
conjugate to j^o, Pi, . . •, Pt-i- It consists of the set 
of points x satisfying the equation 



iApj,x-Xi) = iJ = OA, . 



.,'/-l). 
.,.7",,, and hence 



(6:8) 
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This plane contahis the points Xf^i, 
the solution h. 

Theorem 6 :7. The gradient of the function fix) at x, 
in the plane tv, is a scalar multiple of the vector p,. 

The gradient oifix) at x, is the vector —r,. The 
gradient q, of /(a?) at x, in tt, is the orthogonal projec- 
tion of —r, in the plane w,. Hence q, is of the form 

qi=—ri + aoAp^)+ . . . + ai.iApi.i, 



where ao? • • -ycn-i are chosen so that qi is orthogonal 
to Apo, . . .,Api^i, Since 






' j+i 



= ri 



-ajApj (j=0,\, 



.,^-l), 



it is seen upon ehmination of Tq, Ti, . . ., Vf^i succes- 
sively that Pi is also a linear combination of r,-, ApQ, 
. . .,Api_i. Inasmuch as 2>t is conjugate to 7^0? • • -, 
jt?i_i, it is orthogonal to ^Po, • • -, Api_i. The vector 
Pi accordingly is a scalar multiple of the gradient qi 
of /(x) at Xi in Wi, as was to be proved. 

In view of the result obtained in theorem 6:7 it is 
seen that the name ^'method of conjugate gradients" 
is an appropriate name for the method given in sec- 
tion 3. In the first step the relaxation is made in the 
direction po of the gradient oi J{x) at .Tq, obtaining 
a minimum value of /(x) at Xi. Since the solution h 
lies in tti, it is sufficient to restrict x to the plane tti. 
Accordingly, in the next step, we relax in the direc- 
tion pi of the gradient of /(x) in tti at Xi, obtaining the 
point X2 at which /(x) is least. The problem is then 
reduced to relaxing /(x) in the plane 7r2, conjugate to 
Pq and pi. At the next step the gradient in X2 in X2 
is used, and so on. The dimensionality of the space in 
which the relaxation is to take place is reduced by 
unity at each step. Accordingly, after at most n 
steps, the desired solution is attained. 

7. Properties of the Estimates Xy of h in the 
cg-Method 

In the cg-method there is a second set of estimates 
Xo=Xo, Xi, X2, . ■ ■ oi h that can be computed, and 
that are of significance in application to linear 
S3^stems arising from difference equations approxi- 
mating boundary- valve problems. In these applica- 
tions, the function defined by Xi is smoother than 
that of Xi, and from this point of view is a better 
approximation of the solution h. The point Xi has 
its residual proportional to the conjugate gradient Pi. 
The points Xq, ii, X2, . . . can be computed by the 
iteration (7:2) given in the following: 

Theorem 7:1. The conjugate gradient p i is expressible 
in the form 

Pi=Ci{k-Axi), (7:1) 

where Ci and Xi are defined by the recursion formulas 

Xi^l-f- biCiXi 



^0 — ^0? ^l"+l 

We have the relations 



^t'4-l 



(7:2a) 
(7:2b) 









(7:3a) 



i.^^SA (7:3b) 

Ci ; = OKj| 



ri = k — Axi=-pi = ^'^^^r^^' (7:3c) 

The sum of the coefficients of Xq, Xi, . . ., a^ in (7:3b) 
(and hence of ro,ri, . . ., ri in (7:3c)) is unity. 

The relation (7:1) can be established by induction. 
It holds for i = 0. If it holds for i, then 

Pi^i=ri_^i + b{Pi={l + biCi)k—A{Xi+i + biCiXi) 

= Ci+i{k—Axi+^). 

The formula (7:3a) follows from (7:2a), (5:le) and 
(5:6b). Formula (7:3b) is an easy consequence of 
(7 : 2b). To prove (7 : 3c) one can use (5 : 2) or (7 : 3b), 
as one wishes. The final statement is a consequence 
of (7:3a). 

Theorem 7:2. The point Xi given by (7:2) lies in 
the convex closure Si of the points Xq, Xi, • • •, Xi. It is 
the point x in the i-dimensional plane Pi through Xq, 
Xi, • • ', Xi at which the squared residual \k — Ax\^ has 
its minimum, value. This minimum value is given by 
the formula 



ri 



\n 



\ri\ 



(7:4) 



The squared residuals |ro|^, Ir^l^, • • • diminish mono- 
tonically during the cg-method. At the itk step the 
squared residual is reduced by the amount 



ri-i\ 



\ri\ 



n- 



Ci 



(7:5) 



The first statement follows from (7:3b), since the 
coefficients oi Xq, Xi, - - -, Xi are positive and have 
unity as their sum. In order to show that the 
squared residual has a minimum, on Pi at Xi, observe 
that a point x in Pf differs from x^ by a vector Zt of 
the form 

X—Xi=Zi=aoPo+ • • • + a^_l^^_l. 

The residual r = k — Ax is ^accordingly given by 

r=ri—AZi 
AZi = aoApo+' ' • + ai.iApi^i. 

Inasmuch as, by (7:3c), ri=pi/Ci, wQ: have 

(ri,^Vj)=j {Pi,Apj) = (j<:i). 

Ci 

Consequently, (ri,Azi)=0 and 

\r\'=\?i\'+\Azi\'>\?,\' (xy^Xi). 
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It follows that Xi affords a j)roper miuimiiin to \r\- on 
Pi. Using (7:3c) and (7:3a) and the orthogonahty 
of r/s, it is soon that the minimum valuo of \r\^ on 
P, is givon by (7:4). By (7:4) and (7 :2a) 



\ri- 



V-V'i^- 






l^^--i| 



Ci 



This completes the proof of theorem 7:1. 

Theorem 7:3. The Rayleigh quotients of Tq, Ti, 
. . . and To, Vi, . . . are connected by the formulas 



M(rO_M(n),M(^f-i) 






in 



\ri\ 



|r,-i 



+ 



+ (-1) 



^) I.,. 19' 



(7:6a) 



(7:6b) 



The Rayleigh quotient of ri{i^O) is smaller than that 
of Vi, that is, /x(rO<M(^i)- 

In order to prove this result we use (5:6d) and 
obtain 

(:ri,Ari) {pi,Ap^) .{pi^i,Api.i) 



ri 



I'^-il 



Since |r^|'^=|^^|%'^|- and iJL{pi) = iJL{ri)j this relation 
yields (7:6a). The oq (7:6b) follows from (7:6a). 
The last statement follows from (5:12). 

In the applications to linear systems arising from 
difference equations approximating boundary value 
problems, /x(ri) can be taken as a measure of the 
smoothness of Xf. The smaller ;Lt(r^) is, the smoother 
Xi is. Hence Xi is smoother than Xi. 

Theorem 7:4. At the point Xi the error function 
f{x) has the value 

/(iO=/(x.) + |?.|^S-^^^^^^^ (7:7) 

j=0 \7j\ 

and we have 

f{xd<J{x,)<j{x,^,). (7:8) 



The sequence J (xo), f{xi), J (X2), . . . decreases mono- 
tonically. 

In order to prove this result, it is convenient to set 



6.-1=1 



rA 



Ci-ilrA 



bt- 



\Ti-i\ Ci\ri_i\ Ci 

By (7:2) we have the relation 

Xi Xi=^Oi—i (iCj_] ^f) • 

Using the formula 

Jix) —f(:Xi) = (x—Xi, A{x—Xi)) , 
which holds for any x in Pf, we see that 

J(Xi)—f{Xi)^b'f_i(x.i_i—Xi,A(Xi.i—Xi)), 



(7:9) 



(7:10) 



that IS, 



J{Xi) -j(:Xi) = M-,lf{Xi_,) -f^Xi)]. (7:11 ) 



By (7:9) it is seen that this result can l)o })iit in the 
form 

j(x^)-j{x,)_j{Xi.,)-f{xd ■ /(^,_0-/(x,,0 

14 I 



Vi\ 



r^t-ii 



in-ll 



Since Xv,=x^ this formula yields the desired relation 
(7:7). Since 6i_i<l, it follows from (7:11) and 
(7:7) that (7:8) holds. This proves the theorem. 

Theorem 7:5. The error vector yi=h—Xi is shorter 
than the error vector yi—h—Xi. Moreover, yt is shorter 
than yi-]. 

The first statement follows from (7:2). It also 
follows from theorem 6:2, since Xi is in S^. By 
(7:2) the point Xi lies in the line segment joining 
Xi to Xi-\. The distance from h to Xi exceeds the 
distance from h to Xi. It follows that as we move 
from Xi to Xi_i the distance from h is increased, as 
was to be proved. 

8. Propagation of Rounding-Off Errors in 
the cg-Method 

In this section we take as basic relations between 
the vectors ro, ri, • • • and Vq, Vij - - - in the cg- 
metliod the following: 



ai- 



i'i+i = 



yi\ 
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-ri — aiApi 



Pt+i = r{+i+biPi. 



(8:1a) 
(8:1b) 
(8:1c) 
(8: Id) 
(8:le) 



As a consequence, we have the orthogonality 
relations 



{ri,r,)=0, {Api,p,) = (i^^k). 



(8:2) 



Because of rounding-oft' errors during a numoi-ical 
calculation (routine), these relations will not be 
satisfied exactly. As the difference \k—i\ increases, 
the error in (8:2) may increase so rapidly that ;/•„ 
will not be as good an estimate of h as desired. This 
error can be lessened in two ways: first, by intro- 
ducing a subsidiary calculation to reduce round ing- 
off errors; and second, by repeating the iteration so 
as to obtain a new estimate. This section will be 
concerned with the first of these and with a study of 
the propagation of the rounding-oft' errors. To this 
end it is convenient to divide the section in four parts, 
the first of which is the following:: 
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8.1. Basic propagation formulas 

In tills part we derive si;niple formulas showing 
how errors in scalar products of the type 



(r^-_i/i), iApi^i,Pi) 



(8:3) 



are propagated during the next step of the computa- 
tion. From (8:le) follows 

Inserting (8:1c) in both terms on the right yields 

(ri,ri+i) = {pi,ri) — a,(pi,Api) — bi,i(pi^i,ri) 

+ bi.iai{Api-i,Pi). 

Applying (8:le) to the first and third terms gives 

(ri,ri^i)={ri,ri)—ai(pi,Api) + bi_iai(Api.i,Pi), (8:4) 
which by (8:1b) becomes 

(ri,ri+i) = bi.iai(Api_i,Pi). 



(8:5) 



This is our first propagation formula. 
Using (8:le) again, 

(Ap{,Pi+i)=Api,ri^i) + bi{Api,pi). 

Inserting (7:1c) in the first term, 



(^p,,p,+i)=---|r,+i|2+ 



1 



ai at 

But in view of (8:1b) and (8: Id) 



(ri,ri+i) + bi(Api,pi), 

(8:6) 



Therefore, 



\ri^-^\^=aibi{Api,pi). 



{Api,pi+i)==— (ri,ri+i), 

(Li 



(8:7) 



(8:8) 



This is our second propagation formula . 

Putting (8:5) and (8:8) togetherjyields the third 
and fourth propagation formulas 



(r,,r.+0=%^^ (Ti-uTi) 



{Api,Pi+i) = bi.i(Api_i,Pi), 
which can be written in the alternate form 

Vij^i + V _^i (^i-l)^\-) 

\r'i\^ ~o 



(Api,pi+i) ^ aj (Api_i,pi) 
{Api,pi) a,_i {Api_i,pi_^) 



(8:9a) 
(8:9b) 

(8:10a) 



by virtue of (8:1b) and (8:ld). Each of these propa- 
gation formulas, and in particular the simple formu- 
las (8:9), can be used to check whether nonvanishing 
products (8:3) are due to normal rounding-off errors 
or to errors of the computer. The formulas (8:10) 
have the following meaning. If we build the sym- 
metric matrix P having the elements (Apt^pjc), the 
left side of (8:10b) is the ratio of two consecutive 
elements in the same line, one located in the main 
diagonal and one on its right hand side. The 
formula (8:10b) gives the change of this ratio as we 
go down the main diagonal. 

8.2. A Stability Condition 

Even if the scalar products (8:2) are not all zero, 
so that the vectors Po, Pi, • • *, Pn-i are not exactly 
conjugate, we may use these vectors for solving 
Ax=k in the following way. The solution h may be 
written in the form 

h=Xo+aoPo+a[pi + ' ' '-{al^^pn-i, (8:11) 

Taking the scalar product with Api, we obtain 

(xo,APi) +12(Api,Pjc)ctl= (h,Api) = (Ah,pi) = (k,pi) 



or 



^(Api,p,)al={ro,Pi) . 



(8:12) 



The system Ax=k may be replaced by this linear 
system for do,- • •,al_i. Therefore, because of 
rounding-off errors we have certainly not solved 
the given system exactly, but we have reached a 
more modest goal, namely, we have transformed the 
given system into the system (8:12), which has a 
dominating main diagonal if rounding-off errors have 
not accumulated too fast. The cg-algorithm gives 
an approximate solution 



h'=Xo + aoPo+' . '+an-lPn-l- 



(8:13) 



A comparison of (8:11) and (8:13) shows that the 
number a^ computed during the cg-process is an 
approximate value of al. 

In order to have a dominating main diagonal in 
the matrix of the system (8:12) the quotients 



{Api,pj c) 
iApi,pi) 



(i9^k) 



(8:14) 



(8:10b) 
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must be small. In particular this must be true for 
k=i-i-l. In this special case we learn from (8:10b) 
that increasing numbers ao, ai,- • • during the cg- 
process lead to accumulation of rounding-off errors, 
because then these quotients increase also. We 
have accordingly the following stability condition. 

The larger the ratios ai/ai-i, the more rapidly the 
rounding-of errors accumulate. 

A more elaborate discussion of the general quotient 
(8:14) gives essentially the same result. 



By theorem 5:5, the scalars at lie on the range 
\ — <^^^'^\ — ' 

^max Amin 

where Xmin, Xmax are the least and the largest eigen- 
values of A. Accordingly, the ratio p == Xmax/Xmin is an 
upper bound of the critical ratio ai/at-i, which deter- 
mines the stability of the process. When p is near 
one, that is, when A is near a multiple of the identity, 
the cg-method is relatively stable. It will be shown 
in section 18 that examples can be constructed in 
which the ratios a^/a^_l {i=l,' • -,71—1) are any 
set of preassigned positive numbers. Thus the 
stability may be low. However, this instability can 
be compensated to a certain extent by starting the 
cg-process with a vector Xq whose residual vector Vq 
is near to the eigenvector of A corresponding to Xmin. 
In this event ao is near to the upper bound l/Xmin of 
the at. This result is brought out in the following 
theorem: 

For a given symmetric and positive definite matrix A, 
which has distinct eigenvalues, there exists always an 
initial residual vector r^ such that (ai/ai-i)<^l and hence 
such that the algorithm is stable with respect to the 
propagation of rounding-off errors. 

In order to prove this we introduce the eigenvalues 

Xaiin^X()<^Xl<\X2\ . . • <CX„_i=Xnrax 

of A, and we take the corresponding normalized 
eigenvectors as a coordinate system. Let ao, a:i, ..., 
an-i be real numbers not equal to zero and e a small 
quantity. Then we start with a residual vector 



--(ao,ai€,a2e^ 



,«„_ie 



'y 



(8:14a) 



Expanding everything in a power series, one finds 
that 



Hence 






tt I _ 1 X i 



(8:14b) 



if € is small enough. 

As a by-product of such a choice of fo we get by 
(8:14b) approximations of the eigenvalues of A. 
Moreover, it turns out that in this case the successive 
residual- vectors Tq, ri, ..., rn-i are approximations of 
the eigenvectors. 

These results suggest the following rule: 

The cg-process should start with a smooth residual 
distribution, that is, one for which fx (ro) is close to X^iin. 
IJ needed, the first estimate can be smoothed by some 
relaxation process. 

Of course, w^e may use for this preparing relaxation 
the cg-process itself, computing the estimates Xi 
given in section 7. A simpler method is to modify 
the cg-process by setting bi=0 so that Pi=ri and 
selecting at of the form ai=a/iji(ri), where a is a 
small constant in the range 0<a:<l. 



8.3. The End-Correction 

The system (8:12) can be solved by ordinary re- 
laxation processes. Introducing the numbers ajc as, 
approximations of the solutions al, we get the resid- 
uals 

(ro,Pi) — '^{Api,p,c)ci,,, (8:15) 

k 

Inasmuch as To =rj+i+ tto^i>o+. . .+^1^^^ by (8:1c), 
we have 

(ro,Pi) = (ri+i,Pi)+ao{Api,po) + . . .+ai(Api,pi). (8:16) 

It follows that the residual (8:15) is'reduced to 

{ri^i,Pi) — {Ap-„pi^i)ai^i—{Api,pr^^2)ai+2—- . • 

-{Ap„p.n-i)an-i. (8:17) 

This leads to the correction of at 

^^'^(Ap7,pi) {(^^+i'i^O-(^ii>z,i>z-fi)^i+i 

— iApuPi^o)ai^2-' . .— (Api,Pn~i)cin-i}' (8:18) 

A first approximalioji of a'i is accordingly 

a-'^at+Aai. 

In order to discuss this result, suppose that the 
numbers a^ have been computed accorduigly to 
the formula 



ai = 



(Pi,ri) 
(PuAPi) 



(8:19) 



(theorem 5:5). From (8:1c) it follows that 
(ri^i,Pi) = 0, and therefore this term drops out in 
(8:18). In this case the correction Aa^- depends 
only on the products {Api,pk) with i<ik- That is 
to say, that this correction is influenced only by the 
rounding-off errors after the i-ili step. If, for 
instance, the rounding-ofl* errors in the last 10 steps 
of a cg-process are small enough to be neglected, 
the last 10 values a^ need not to be corrected. Hence, 
generally, the Aa^ decrease rather rapidly. 

From (8:18) we learn that in order to have a good 
rounding-off behavior, it is not only necessary to 
keep the products {pk,Api) {iy^k) small, but also to 
satisfy (rij^i,pi) = ^ as well as possible. Therefore, 
it may be better to compute the ai from the formulas 
(8:19) rather than from (8:1b). We see this im- 
mediately, if we compare (8:19) with (8:1b); by 
(8:19) and (8:le)wehave 



ai= 



{PuAPi) 



{|ri|2+&^•-l(^^,P^-l)J 



For ill-conditioned matrices, where a^ and bi may 
become considerably larger than 1, the omitting 
of the second summand may cause additional errors. 
For the same reason, it is at least as important in 
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these cases to use formula (3:2b) rather than (8: Id) 
for determming 6^, since by (3:2b) and (8:1c) 



br- 



aiipi,Api) 



'i+i\ 



-(^t+i/i)] 



Here the second summand is not directly made 
zero by any of the two sets of formulas for a^ and 
hi. The only orthogonalit}^ relations, which are 
directly fulfilled in the scope of exactitude of the 
numerical computation by the choice of a^ and bt, 
are the following: 

{ri^i,Pi)=Q, (pi^uApi) = 0. 

Therefore, we have to represent (r^_^i,ri) in terms 
of these scalar products: 

{r^+l,ri) = iri^l,Pi) — bi^l(:ri,Pi.l)+aibi_l{pi,Ap^_l)' 

From this expression we see that for large bi and a^ 
the second and third terms may cause considerable 
rounding-off errors, which afl*ect also the relation 
{pi.^i,Api)^^0, if we use formula (8: Id) for bf. This 
is confirmed by our numerical experiments (sec- 
tion 19). 

From a practical point of view, the following 
formula is more advantageous because it avoids the 
computation of all the products {Api,p}c). From 
(8:1c) follows 



Introducing the correction factor 



rn=ri+i—aiJ^iApi^i—ai+2Api+2— 



and we have the result 



Aai = 



(rn,Pi) , 
{Api,pi) 



— an-iApn-1 



-an+i{Api,Pn-i). 



(8:20) 



This formula gives corrections of the at if, because 
of rounding-off errors, the residual /•„ is not small 
enough. 

8.4. Refinement of the cg-algorithm 

In order to diminish rounding-off errors in the 
orthogonality of the residuals r^ we refine our general 
routine (8:1). After the ith step in the routine we 
compute (Api-i,pi), which should be small. Going 
then to the (i-fl)s^ step we replace a^ by a slightly 
different quantity at chosen so that (ri,ri^i)=0. In 
order to perform this, we may use (8:4), which now 
must be written 

(^t/z+i) = {ri,ri) — ai{Api,pi)+bi.iai{Api_i,pi)=0 

yielding 



\ri 



di=l — bi- 



(Api_r,pi) 
' (Api^Pi) 



(8:21) 



{Api,pi)—bi-i{Api_^,Pi) 



and taking into account the old value (8:1b) of a^, 
this can be written in the form 



at 



(8:22) 



Continuing in the general routine of the ('i+l)st step 
we replace bi by a number hi in such a way that 
{Api,pi^i)=0. We use (8:6), which now must be 
written in the form 

— |rf+i]2+ (ri,ri^i)+aibi{Api,pi) =0. 

The term (ri,ri+i) vanishes by virtue of our choice 
of "di. Using (8:7), we see that aibi=a$i and from 

(8:22) 

bi=b4i- (8:23) 

Since rounding-off errors occur again, this sub- 
routine can be used in the same way to improve the 
results in the (i+l)th step. 

The corrections just described can be incorporated 
automatically in the general routine by replacing the 
formulas (3:1) by the following refinement: 

VQ=rQ='k — AxQ, do=l 



ai=7 



\ri\ 



(Pi,Api) di 

Xi^l = Xi-\-(liPi 

ri^i = ri-aiApi 



(8:24) 



6.= 



di 



Vi+i = r 
dt+i=l — bi 



Til 

i^i + biPi 



{Api^i,Pi) 
{Api+i,Pi+{) 



Another quite obvious, but numerically more 
laborious method of refinement goes along the 
following lines. After finishing the -ith step, compute 
a product of the type {Api,pu) with k<ii. Then 
replace pi by 



__{Ap^^ 



(8:25) 



The vector pt is exactly conjugate to Pk- This 
method may be used in place of (8:24) in case 
k=i—l. It has the disadvantage that a vector must 
be corrected and not just numbers, as in (8:24). 
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9. Modifications of the cg-Process 

In the cg-method given by (3:1), the lengths of the 
vectors 'p^^ Pi, - - - are at our disposal. In order to 
preserve significant figures in computations, it is 
desirable to have all the pt of about the same magni- 
tude. In order to aim at this goal, we normalized 
the ^^s by the formulas 

Po=ro, Pi=ri+bi^iPi^i (^>0)- 

However other normalizations can be made. In 
order to see how this normalization appears, we 
replace pt by dfPi in eq (3:1), where df is a scalar 
factor. This factor di is not the same as that given 
in section 8 but plays a similar role. The result is 



ro=k—Axo, 




Xi+l = X^ + a^Pi 




ri+i=ri—aiApi 




Ti^i+biPi 
Pi+i~ J 




. _ Vi? 


_ (Vurd 


' (pi,Api)di 


(PiAvi) 


, _|r,+i|2cZ,_ 


{ri^i,Api) 



(9:1) 



^i 



(Pu^Pi) 



The connections between a^, bi, di are given by the 
equation 



m(^z)- 






(^>0), (9:2) 



where jLt(r) is the Kayleigh quotient (4:12). In order 
to establish these relations we use the fact that r^- 
and r^+i are orthogonal. This yields 



\ri\^=ai{ri,Api) 

\ri+i\^= — ai(ri+i,Api) 



(i^O) 



by virtue of the formula ri^i=ri—aiApi. From the 
connection between Pi and Vf, we find that 



di 



- \ri\^=di(ri,Api) 

U/i 

= {ri,Ari)+bi^i(ri,Api^i) 

=(r„Ari)-^ ir.f . 

This yields (9:2) in case '^>0. The formula, when 
i=0, follows similarly. 

In the formulas (9:1) the scalar factor di is an 
arbitrary positive number determining the length of 
Pi. The case di=l is discussed in sections 3 and 5. 
The following cases are of interest. 



I. The vector Pi can be chosen to be the residual 
vector Vi described in section 7. 
In this event we select 



do=l, di^i = l+bi 
The formula (7:2b) for x^+i becomes 



Xi^l — - 



1 + 6, 



(9:3) 



(9:4) 



11. The vector Pi can be chosen so that the formula 

holds. 

In this event the basic formulas (9:1) take the 
simple form 



ro=k—Axo 



Xi^l = Xi- 



r<+i = rr 



Po= 



Pi 






Pi+i=Vi-^ 



(P^Pi) 
Api 

{PiAPi) 



(9:5) 



Fi+i| 



This result is obtained from (9:1) by choosing 

di^\ri\\ 

In this case the formulas (9:5) are very simple and 
are particularly adaptable to computation. It has 
the disadvantage that the vectors pi may grow con- 
siderably in length, as can be seen from the relations 

\Pi^i?=\Pi?- 



However, if ^'floating" operations are used, this 
should present no difficulty. 

III. The vector pi can be chosen to be the correction 
to be added to Xi in the {i-\-l)st relaxation. 

In this event, ai=l and the formulas (9:1) take 
the form 



7*0 — Ic AXqj 

Xi^l = Xi-\-pi 

ri^i=ri—Api 
ri+i + biPi 



^-s 



(9:6) 



Pt+i 



dijfi 

di+i-=iJ>(ri+i)~bi 



'i + U 



rt 



di. 



2274-10—52- 
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These relations are obtained from (9:1) and (9:2) by 
setting ai=l. 

TV. The vector pi can be chosen so that a^ is the recip- 
rocal of the Eayleigh quotient of ri. 

The formulas for Oi, 6^ and di in (9:1) then become 



a^= 



l^^-l 



(ri,Ari) 



Fi+i\ 



di 



do — 1; dij^i — 1 ■ 



btai^ 



This is sufficient to indicate the variet3^ of choices 
that can be made for the scalar factor do, di, . , . . 
For purposes of computation the choice di = 1 appears 
to be the simplest, all things considered. 

10. Extensions of the cg-Method 

In the preceding pages we have assumed that the 
matrix ^ is a positive definite symmetric matrix. 
The algorithm (3:1) still holds when A is nonnegative 
and symmetric. The routine will terminate when 
one of the following situations is met: 

(1) The residual Vm is zero. In this event x^ is 
a solution of Ax=k, and the problem is solved. 

(2) The residual r^i is different from zero but 
(Ap,m,Pm)=0, and hence Aprn=0. Since pi^ari, 
it follows that Ar^i = Q, where r^ is the residual of the 
vector x,n defined in section 7. The point Xm is 
accordingly a point at which \k—Ax[^ attains its 
minimum. In other words, Xm is a least-square 
solution. One should observe that Pm^O (and hence 
r^F^O). Otherwise, we would have rm=—hm-iPm-u 
contrary to the fact that Vm is orthogonal to Pm-i- 
The point Xm fails to minimize the function 

g{:x) = {x,Ax) — 2{k,x), 

for in this event 

giXm + tPm)=-g{Xm) — 2t\rm\\ 

In fact, g{x) fails to have a minimum value. 

It remains to consider the case when ^ is a general 
n on singular matrix. In this event we observe that 
the matrix A^A is symmetric and that the system 
Ax=k is equivalent to the system 



A'^Ax=A'^k, 



(10:1) 



Applying the eq (3:1) to this last system, we obtain 
the following iteration, 



ro=k- 



■ AXoy P0 = 



■A^ro, 



at- 



\Api 



Xi+i — Xi-f-aiPi 
ri+i = ri — aiApi 



(10:2) 



Pi+i = A'^ri+i + biPi. 



If one does not wish to use any properties of the 
cg-method in the computation of a^- and bt besides 
the defining relations, since they may be disturbed 
by rounding-off errors, one should use the formulas 

_{Api,Ti) 



br- 



\Ap,\' 

(Api,AA''ri+i) 

\Ap^ 



In this case the error function f(x) is the function 
f{x) = \k—Ax\^, and hence is the squared residual. 
It is a simple matter to interpret the results given 
above for this new system. 

It should be emphasized that, even though the use 
of the system (10:2) is equivalent from a theoretical 
point of view to applying the cg-algorithm to the 
system (10:1), the two methods are not equivalent 
from a numerical point of view. This follows because 
roun ding-oft' errors in the two methods are not the 
same. The system (10:2) is the better of the two, 
because at all times one uses the original matrix A 
instead of the computed matrix A'^A, which will 
contain rounding-off errors. 

There is a slight generalization of the system (10 : 2) 
that is worthy of note. This generalization consists 
of selecting a matrix B such that BA is positive defi- 
nite and symmetric. The matrix B is necessarily of 
the form A'^H, where H is positive definite and 
symmetric. We can apply the cg-algorithm to the 
system 

BAx=Bk. (10:3) 

In place of (10:2) one obtains the algorithm 



''' (puBAp,)' 

Xi^i = Xi-\-aiPi, 
Ti+i^ri — aiApi, 



Vo-- 



--Br. 



(10:4) 



br- 



\BrA' ' 



Pi+i=Bri^i + biPi. 

Again the formulas for a,- and 6^, which are given 
directly by the defining relations, are 



ai = 



br- 



iPuBApi) 
(Bri^„BApi) 



(Pi,BApi) 

When B=A'^, this system reduces to (10:2). If A 
is symmetric and positive definite, the choice B=I 
gives the original cg-algorithm. 
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There is a generalization of the cd-algorithin con- 
cerning which a few remarks should be made. In 
this method we select vectors ^^o, • • -, Vn-i and 
^0, . . ., q.n-\ such that 






(10:5) 



The solution can be obtained by the recursion for- 
mulas 



ar- 



'{qi,Api) {qi,Apiy 
ri^i = ri — aiApi. 



(10:6) 



The problem is tlu^n renin ('(m1 to finding the vectors 
j)i, gj such that (10:5) holds. We shall show^ in a 
moment that g^ is of the form 



-J^^'P^, 



(10:7) 



where J^ has thc^ ]3ropei'ty that /M is symmeti-ic and 
positive definite. The algorithm (10 : 6) is accordingly 
equivalent to applving the cd-algorithm to (10:3). 
To see that qt is of the form (10:7), let P be the 
matrix whose column vectors are po? • • -, Vn-\ and Q 
be the matrix whose column vectors are q^, . . ., qn-\- 
The con(liti(m (10:5) is equivalent to the statement 
that the matrix D^Q'^AP is a diagonal matrix whose 
diagonal terms are positive. Select B so that 
Q=B *P. Then D=P''BAP from which w^e conclude 
that BA is a positive definite symmetric matrix, as 
was to be proved. 

In view of the results just obtained, we see that 
the algorithm (10:4) is the most general cg-algorithm 
for any linear system. Similarly, the most general 
cd-algorithm is obtained: by (i) selecting a matrix 
B such that BA is symmetric and positive definite, 
(ii) selecting nonzero vectors ^^o, • • •? Vn-\ such that 

(p,,5^2;,)-0, {i^j) 

and (iii) , using the recursion formulas 

^ {pi,Bri) ^ {Pi,Bro) 
''' ip^BApd {pi,BAp,) 

Xi^i=Xi-\-ai2^i 

ri^i=Ti — aiA'Pi. 

11. Construction of Mutually Conjugate 
Systems 

As was remarked in section 4 the cd-method is not 
complete until a method of constructing a set of 
mutually conjugate vectors 2>o, Vi^ - - - ^^^s been 



given. In the cg-method the choice of the vector 
Pi depended on the result obtained in the pievious 
step. The vectors ^o, ^i, • • • are accordingly deter- 
mined by the starting ])oint .Tq and vary with the 
point 2o. 

Assume again that A is a positive definite, sym- 
metric matrix. In a cd-method the vectors y>o, 
2?i, . . . can be chosen to be independent of the 
starting point. This can be done, for example, by 
starting with a set of n linearly independent vectors 
Uq, Ui, . . ., Un-\ and constructing conjugate vectors 
by a successive ^-orthogeonalization process. For 
example, we may use the formulas 



vi-- 

V2-- 



--Ui — aioPo, 



oc-iiPu 



(ll:l) 



2yi = Ui— atoPi)— anpi — . 



-iPi-i- 



The coellicient aij{iy>j) is to be chosen so that pi is 
conjugate to pj. The formula for atj is evidently 



0^ij=/ 



_{Ui,Apj) 



(j<i)^ 



Observe that 



(VjAPj) 
{pi,Auj)=0 (j<:i) 
(Pi,Aui) = {pi,Api). 
Using (11:3) we see that alternately 

(Aui,pj) 



OCij = 



(Auj,Pj) 



(11:2) 



(11:3) 



(11:4) 



As described in section 4, the successive estimates of 
the solution are given by the recursion formula 



where 



a;o=0, 



Xi^i — Xi-\-aiPi, 



iViAVr) 



(11:5) 
(11:6) 



There is a second method of computing the 
vectors p^, pi, . . ., Pn-\, given by the recursion 
formulas 

(11:7a) 

(11:7b) 

(11:7c) 



U^=Ui 



Pj==u;^ 



w 



^u^'^-atjpj, 



(?;=i+i, . . .,n) 



«zr 



_{u^''\Auj) 

(Pj.Ay^j) 



(^>J*). 



(ll:7d) 
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We have the relations (11:3) and, 

(uj'\Auj) = (j<k) (11:8a) 

{ul^\Apj)=0 {j<k) (11:8b) 

ul^^=Ui—aioPo—' . . — ai,j^ipj^i (i>j) (11:8c) 

{pi,Apj) = (iVj). (ll:8d) 

The eq (11:8a) hold when k=j+l by virtue of 
(11:7c) and (ll:7d). That they hold for other 
values of j<ik follows by induction. Equation 
(11:8c) follows from (11:7c). 

If one selects successively Uo=ro, Ui=ri, . . ., 
Un-i=rn-ij the procedure just described is equivalent 
to the cg-method described in section 3, in the sense 
that the same estimates Xq, Xi, . . . and the same 
direction vectors po, Pu . . . are obtained. If 
one selects Uo=k, Ui=Ak, . . ., Un-i—A^~^k, one 
again obtains the same estimates Xq, 2'i, . . . as in 
the cg-method with a;o=0. However in this event 
the vectors po^ pi, . . . are multiplied by nonzero 
scalar factors. On the other hand if one selects 

1^0= (1,0,. . .,0), %=-(0,l, . . .,0), . . ., Un-l = {0, 

, . .,0,1) the cd-method is equivalent to the Gauss 
elimination method. This case will be discussed in 
the next section. 

12. Connections With the Gauss Elimina- 
tion Method ^^ 

In the present section it will be convenient to 
use the range 1, . . ., n in place of 0, 1, . . ., n—l 
used heretofore, except for the notations o^o, Xi, . . ., 
Xn describing the successive estimates of the solution. 
Let ei, , , ., Cn be the unit vectors (1,0, . . .,0), 
(0,1,0,. . .,0), . . ., (0,. . .,0,1). These vectors 
will play the role of the vectors 'Uq, . . ., Un-i of 
section 11. The eq (11:7), together with (11:4) and 
(11:5), yield the recursion formulas 



ul''=ei 



(i=l,, . ,^n) 



Pi=ui^^ 



Ul^ + V^UI^K 



-atjPj 



OCiJ 



3^0=0, 



ai= 



(Apj^ej) 
Xi=Xt^i+aiPi 



,,n) 



{Api,ei) 



(12:1a) 
(12:1b) 
(12:1c) 

(12: Id) 

(12:le) 

(12:lf) 



These formulas generate mutually conjugate vectors 
Pi, , . ,, pn and corresponding estimates Xi^ . . ., x^ 
of the solution of Ax=k. In particular Xn is the 
desired solution. The advantage of this method 
lies in the ease with which the inner products appear- 



L 12 cf. Fox, Huskey, and Wilkinson, loc. cit. 



ing in (12: Id) and (12: If) can be computed. A 
systematic scheme for carrying out the computations 
will now be given. The scheme is that commonly 
used in elimination. In the presentation that we 
now give, extraneous entries will be kept so as to 
give the reader a clear picture of the results obtained. 
We begin by writing the matrices A, I and the 
vector Z: as a single matrix 



an 


«12 


^13 . 


. . am 


1 





. 


. . 


ki 


«^21 


«22 


G^23 . 


. . a2n 





1 


. 


. 


h 


^31 


<^32 


0^33 . 


. . dzn 








1 . 


. 


. 



(^n\ (^n2 CinS • • • ^«n ... 

The vector pi is the vector (1,0, . . 
kifaii is defined by (12:lf). Hence, 

Xi = aipi = (--^y 0, . . .,0 ) 
is our first estimate. Observe also that 



(12:2) 



. 

1 kn. 
, 0), and ai = 



Oiil = 



Oh 



Multiplying the first row by an and subtracting the 
result from the ^th row (^=2, • • • ,n), we obtain the 
new matrix 



an 


ai2 


^13 . 


. . am 


2^11 





ft (2) 

a 22 


^(2) 


^(2) 
• • (^2n 


2^21 





^32 


^(2) 
a 33 


^(2) 


uif 



V2n 



kr 



rj{2) 

^ n2 



aZ 



<' 



uk, 



K' 



(12:3) 

One should observe that (0,A:f ,. . .,¥n) is the residual 
of Xi. By the procedure just described the 'ith row 
(i>l) of the identity matrix has been replaced by 
u^^^ , the second row yielding the vector ^2='?^^2^ . Ob- 
serve also that 



a,f-(^tzf,e,) 



n(.2) — 
"'22 



{Ap2,e2), 



{i=2, . . ., n) 
(V2,k), 



1.(2) 
H 2 



Hence, 



h(2) 

+ /t 2 



is the next estimate of the solution. Moreover, 

a ^^^ 
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Next multiply the 2nd row of (12 : 3) by a-a and sub- 
tract the result from the i\h row ('<f=3/--,7i). We ob- 
tain 



an 


Gl2 


0^13 





CI22 


^(2) 








«ir 








air 



(2) 



a 



2n 



a 



(3) 






Vxn 


ki 


Pin 


k'f 


Vin 


kf 


< 


k'-f 







% 



(3) 



/:f. 



The vector (0,0,^:^3^. . .,^^^0 is the residual of X2. 
The elements u^f ,, . .,ulf^ form the vector uf{i = d) 
We have 



and p^=uf 



a!f=(Auf,e,) 
We have accordingly 



(^ = 3,. . .,n) 



and 



L(3) 
I "^3 

W'33 



^(3) 
"'33 



Proceeding in this manner, we finally obtain a matrix 
of the form 



«]1 


ai2 


<^13 • 


. . am 


Pll . 


. . Pm 


/^l 





a^f 


«'23 


• • ^3n 


P2I . 


. . P2n 


/^r 








^(3) 


• • "'3n 


P31 . 


. • P3« 


tl^3> 







P;^l 



Pnn 



(12:4) 



The elements fn, • • •, ^1,^ define a vector pi. The 
vectors Pi, - - -■, Pn are the mutually conjugate 
vectors defined by the iteration (12:1). At each 
stage 

^i^ = ^ (i^i+l, . . .,71) 

a^^ = {pi,Ap^), k^''^ = (pi,k) = {pi,ri). 

Moreover the estimate Xi of the solution h is given 
by the formula 

J^(t•) 

(i) Pi- 

The vector 0, • • •, 0, a/j\ • • •, a/;^ defined by the 
first n elements in the itli row of (12:4) is the vector 
Api. If we denote by P tlic matrix whose column 



vectors are pi, p2, 
is the matrix 



', Pn, then the matrix (12:4) 
|P*^ P* P^k\\. 



The matrices P*A and P are triangular matrices 
with zeros below the diagonal. The matrix D=P*AP 
is the diagonal matrix whose diagonal elements are 
an, aif , . . ., a^V- The determinant of Pis unity and 
the determinant of A is the product 

n n ^^^ ^ (n ) 

As was seen in section 4, if we let 



J(x) = (h—x,A(h—x)), 



the sequence 



fMJiXi), . . .,fiXn.-i)J(Xn)=0 

decreases monotoiiically. No general statement 
can be made for the sequence 

I2/0LI2/1I, . . ',\yn-i\,\yn\=o 

of lengths of the error vectors yi = h—Xi. In fact, 
we shall show that this sequence can increase mono- 
tonically, except for the last step. A situation of 
this type cannot arise when the cg-proccss is used. 

If A is nonsymmetric, the interpretation given 
above must be modified somewhat. An analysis of 
the method will show that one finds implicitly two 
triangular matrices P and Q such that Q*AP is a 
diagonal matrix. To carry out tliis process, it may 
be necessary to interchange rows of A. By virtue 
of the remarks in section 10, the matrix Q* is of the 
form B*P. The general procedure is therefore equiv- 
alent to application of the above process to the svs- 
tem (10:3). 

13. An Example 

In the cg-method the estimates Xo,a;i, ... of the solu- 
tion h of Ax=k have the property that the error 
vectors yQ=h—XQ, yi = h—Xi, , . . are decreased in 
length at each step. This property is not enjoyed 
by every cd-method. In this section we construct 
an example such that, for the estimates Xq=^^,Xi, , . ., 
of the elimination method, 

\h—Xi.i\<C\h—Xi\ {i=l, . . .,n—l). 

If the order of elimination is changed, this property 
may not be preserved. 

The example we shall give is geometrical instead 
of Dumerical. Start with an (/i—l) -dimensional 
ellipsoid E^ with center Xn=h and with axes of un- 
equal length. Draw a chord Cn through Xn, which 
is not orthogonal to an axis of En. Select a point 
Xn-i7^Xn on this chord inside En, and pass a hyper- 
plane Pn-i through Xn-i conjugate to Cn, that is, 
parallel to the plane determined by the midpoints 
of the chords of En parallel to Cn- Let en be a unit 
vector normal to P„_i. It is clear that Cn is not 
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parallel to Cn- The plane Pn-\ can be shown to cut 
En in an (ti— 2) -dimensional ellipsoid En-i with cen- 
ter at Xn-\ and with axes of unequal length. 

Next draw a chord Cn-\ of En-\ tliroughx^_i which 
is not orthogonal to an axis of En-i, and which is not 
perpendicular to h—Xn-i- One can then select a 
point Xn-2 on Cn-i which is nearer to hXh^nXn-i- Let 
Pn-2 be the hyperplane through Xn-2 conjugate to 
Cn-i' It intersects En-\ in an (n— 3) -dimensional 
ellipsoid En-2 with center at Xn-2' The axes oiEn-2 
can be shown to be of unequal lengths. Let Cn-i be 
a unit vector in P„_i perpendicular to Pn-2' 

We now repeat the construction made in the last 
paragraph. Select a chord Cn-2 of En-2 through 
Xn-2 that is not orthogonal to an axis of En-2 and that 
is not perpendicular to h—Xn-2' Select Xn-^ on Cn-2 
nearer to h than Xn-2y and let Pn-z be a plane through 
Xn-z conjugate to Cn-2' It cuts En-2 in an (n— 4)- 
dimensional ellipsoid En-z with center at Xn-z with 
axes of unequal lengths. Let en-2 be a unit vector in 
Pn-i and Pn-2 perpendicular to Pn-z- Clearly, 6„, 
en-\, en-2 are mutually perpendicular. 

Proceeding in this manner, we can construct 

(1) Chords Cn, Cn-i, . . -, Ci, which are mutually 
conjugate. 

(2) Planes Pn-u • • •, ^i such that P^ is conjugate 
to Cjc^i. The chords Ci, . . ., Ca: lie in Pf,. 

(3) The intersection of the planes Pn-i, • • •, Pi, 
which cuts En in a (i^—1) -dimensional ellipsoid Ej^ 
with center Xjc. 

(4) The point Xi, which 
^<7^— 1. 

(5) The unit vectors en, 
direction of (7i), which are mutually orthogonal. 

Let Xq be an arbitrary point on Ci that is nearer to 
h than Xy. Select a coordinate system with Xq as the 
origin and with ^i, . . ., 6„ as the axes. In this co- 
ordinate system the elimination method described 
in the last section will yield as successive estimates 
the points Xi, . . ., Xn described above. These esti- 
mates have the property that Xf is closer to Xn=h 
than Xj+i if i<in—l. 

As a consequence of the construction just made we 
see that, given a set of mutually conjugate vectors 
Pi, . . ., Pn and a starting point Xq, one can always 
choose a coordinate system such that the elimination 
method will generate the vectors pi, . . ., Pn (apart 
from scalar factors) and will generate the same esti- 
mates Xi, . . ., X;^ of A as the cd-method determined by 
these data. One needs only to select the origin at 
Xq, the vector ei parallel to pi, the vector ^2 in. the 
plane of pi and p2 and perpendicular to ei, the vector 
es in the plane of pi, p2, Pz and perpendicular to ^i 
and e2, and so on. This result may have no practi- 
cale value, but it does serve to clarify the relation- 
ship between the elimination method and the cd- 
method, and also the relationship between the 
elimination method and the cg-method. 

14. A Duality Between Orthogonal Poly- 
nomials and n-Dimensional Geometry 

The method of conjugate gradients is related to 
the theory of orthogonal polynomials and to con- 



is closer to h than Ji+i, 
^2, ^1 (with ei m the 



tinned fraction expansions. To develop this, we 
first study connections between orthogonal poly- 
nomials and 71-dimensional geometry. 

Let m(X) be a nonnegative and nondecreasing 
function on the interval 0<X<;. The (Riemann) 
Stieltjes integral 



j; 



mdm{\) 



then exists for any continuous function /(X) on 
0<\<l. We call m(X) a mass distribution on the 
positive X-axis. The following two cases must be 
distinguished. 

(a) The function m(X) has infinitely many points 
of increase on 0<X<^/. 

(b) There are only a finite number n of points of 
increase. In both cases we may construct by 
orthogonalization of the successive powers 1, X, X^, 
• • ♦, X"" a set of 71+ 1 orthogonal polynomials ^^ 



i?o(X),i?i(X), . . ',Rn{\) 



(14:1) 



with respect to the mass distribution. One has 

r i?,(X)/?,(X)rfm(X) = {ir^ky (14:2) 

The polynomial Ri(\) is of degree i. In case (b), 
RnW is a polynomial of degree n having its zeros at 
the n points of increase of m(X). In both cases the 
zeros of each of the pol^momials (14:1) are real and 
distinct and located inside the interval (0,/). Hence 
we may normalize the pol^^nomials so that 



i?,(0)-l (^=l, 



n). 



(14:3) 



The polynomials (14 : 1) are then uniquely determined 
by the mass distribution. 

During the following investigations we use the 
Gauss mechanical quadrature as a basic tool. It can 
be described as follows: If Xi, • • -jX^ denote the 
zeros of RnW, there exist positive weight coefficients 
//?!, m,2, • • ',mn such that. 



j: 



/?(X)(/m(X) = miB(Xi) + m2Ri\2) - 



. + mnRO^n) 
(14:4) 



whenever R(\) is a polynomial of degree at most 
2n— 1. In the special case b) the X^: are the abscissas 
where m(X) jumps and the m^ the corresponding 
jump. 

In order to establish the duality mentioned in the 
title of this section, we construct a positive definite 
matrix A having Xi, X2, • • •, X^^ as eigenvalues (for 
instance, the diagonal matrix having X], • • •, X^^ in 
the main diagonal and zeros elsewhere). Further- 
more, if Ci, €2, • • •, e-n are the normalized eigen- 
vectors of A, we introduce the vector 



^0=0^1^1 + ^2^2 + 



'T ^ti^nj 



(14:5) 



'3 The various properties of orthogonal polynomials used in this chapter may be 
found in G. Szego, Orthogonal Polynomials, American Mathematical Society 
Colloquium Publications 23 (1939). 
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where 



cti=Mi 



{i=\, 



I Again we may restrict ourselves to the powers 
-^n). (14:6) 1, X, . . . ,X'^~^ That is, we must show that 



We then have 



+ OLn\nen (14:7) 



for /:— 0,1, • • •, n—l. The vectors Tq^At^, • • •, 
A^~Vp arc linearly independent and will be used as a 
coordinate system. Indeed their determinant is up 
to the factor q:]Q!2 • • • a^ Van der Monde's determi- 
nant of Xi, • • •,X,i. B}^ the correspondence 



X,-^AVo (/:-0,l, 



■1) 



(14:8) 



every polynomial of maximal degree ?7 — 1 is mapped 
onto a vector of the 77.-dimensional space and a one- 
one correspondence between these polynomials and 
vectors is established. The correspondence has the 
following properties: 

Theorem 14:1. Let the space of polynomials Ji(\) 
of degree <n — l he metrized by the norm 



||i2||=[j]A'(X)Vm(X)]' 



Then the correspondence described above is isometric, 
that is. 



f 

JO 



R(\)R'(\)d7n(\) = (r,r'), 



where 7?(X), R'{\) are the polynomials corresponding 
to r and r\ 

It is sufficient to prove this for the powers 1, X, 
X^, . . ., X""^ Let X^', X'^ be two of these powers. 
From Gauss' formula (14:4) follows 

f \^\'dm{\)= [ V+'dm{\) 
Jo Jo 

= miXi+^ + m2Xi+^' + . . .+m;,Xl+^ 

But (14:5), (14:6), and (14:7) show that this is 
exactly the scalar product {A^ro,A^'rQ) of the cor- 
responding vectors. 

Theorem 14:2. Let the space of polynomials 7?(X) 
oj degree <n—l be metrized by the norm 



[ CR{\Y\dm(\)]\ 



Then for polynomials R(\),R/(\) corresponding to 
ry one has 



j: 



R (\)R' (X) \dm (\) = (Ar, /), (14:9) 



that is, the correspondence is isometric with respect to 
the weight fwnction \dm(\) and the metric, determined 
by the matrix A. 



X' 



V^-'V-'dm (X) = (^^"+iro, -4%) 



{j,k<n-l). 

(14:10) 



If j<^n — \, this has already been verified. The 
remaining case 



r 

Jo 



V'\hlm (X) = (.4^ro, ^%) {k<n- 1) (14:1 1) 



follows in the same manner from Gauss' integration 
formula, since n-\-k<2n—\. 

Theorem 14:3. Let A be a positive definite sym- 
metric matrix with distinct eigenvalues and let r^ be a 
vector that is not perpendicular to an eigenvector of A. 
There is a mass distribution m(\) related to A as 
described above. 

In order to prove this I'c^sult let Ci, . . ., e^ be 
the noi'maliz(Ml eigenvectors of A and let Xi, . . ., 
X^ be the corrc^sponding (})ositive) eigenvalues. The 
vector fo is expressible in tbe form (14:5). According 
to our assumption no a^: vanishes. The desired mass 
distribution can be constructed as a step function 
that is constant on each of the intervals 0<Xi<X2< 
• • • <C^n<^l, and having a jump at X^; of the amount 
7^^=a^>0, the number / being any number greater 
than X,,. 

We want to emphasize the following property of 
our correspondence. If A and /q are givc^n, we are 
able to establish the corredence without computivg 
eigenvalues of A. This follows immediately from the 
basic relation (14:8). Moreover, we mv able to com- 
pute integrals of the type 

Cr (X) ir (X) dm (X), Cr (X) R' (X) \dm (X), 

Jo Jo 

(14:12) 

where R, W are polynomials of maximal degree 
n—l without constructing the mass distribution. 
Indeed, the integrals are equal to the corresponding 
scalar products (r,r^)j {Ar,r^) of the corresponding 
vectors, by virtue of theorems 14: 1 and 14:2. Finally, 
the same is true for the construction of the orthog- 
onal polynomials /?o(X), i?i(X), . . ., RnW because 
the construction only involves the computation of 
integrals of the type (14:12). The corresponding 
vectors r^, ri, . . ., r«_i build an orthogonal basis in 
the Euclidian n-space. 

15. An Algorithm for Orthogonalization 

In order to obtain the ortliogonalization of poly- 
nomials, the foUowuig method can be used. For 
any three consecutive orthogonal polynomials the 
recurrence relation holds : 



R^+l(\) = (d,-a^\)R,(\)-c,Ri_,(\) 



Ro=l, Cq=0, 
(15:1) 
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where at, Ct, di are real numbers and at^O. Taking 
into account the normahzation (14:3), we have 

l=di-c^. (15:2) 

Hence 

i?,+i(X) = (l+c,-a,X)Z?,(X)-c,i?,_i(X). 

This relation can be wi-itten 

ath at A 

From this equation it is seen by induction that 



pm- 



R 



^• + l^ 



-Ri 



aA 



(15:3) 



are polynomials of degree i. Introducing the num- 
bers 



6.-1= 



Cfdi — l 

; 

at 



b-i = 



we have 



P,(X)=i?,(X) + 6,.iP,_i(X) 
P,+i(X)=P,(X)-a,XP,(X). 



(15:4) 

(15:5a) 

(15:5b) 



Beginning with Po=l, we are able to compute by 
(15:5) successively the polynomials Pq=Ro=1, Ri, 
Pi, R2, P2, . . ., provided that we know the numbers 
at, hi. In order to compute them, observe first the 
relation 



/; 



P,{\)P^{\)\dm{\)=^ {i9^k), (15:6) 



Indeed this integral is up to a constant factor 



/: 



(Ri+,-Ri)P,dm(\). 



For k<Ci this is zero, because the second factor is of 
degree k<Ci- 

Using (15:5a) and (15:6), we obtain 

r Ri{\)Pi(\)\dm{\)= C Pi(\y\dm{\). 

Combining this result with the orthogonality of 
i?i+i and Ri, we see, by (15:5b), that 



ai== 



rRi{\ydm{\) 



(15:7) 



f Pi{\y\dm{\) 
Jo 

Using (15:6) and (15:5a), 
0= r 7?,(X)P,_i(X)Xc?m(X) + 6,_i C Pi_,(\y\dm{\), 



Applying (15:3) to the first term yields 

-^ r Ri{\ydm(K)=bi.i r Pi_^{\yUm{\) 

^i-l Jo Jo 

Combining this result with (15:7), we obtain 

f Ri{\ydm(k) 
Jo 



bi-i= 



£^'-' 



{\ydm(\) 



(15:8) 



The formulas (15:5), (15:7), (15:8), together with 
Po=lj &-i = 0, complete^ determine the polynomials 

Ro) Rl) • • -y Rn-l- 

16. A New Approach to the cg-Method, 
Eigenvalues 

In order to solve the system Ax=k, we compute the 
residual vector rQ=k—AxQ of an initial estimate Xq of 
the solution h and establish the correspondence based 
on A, To described in Theorem 14:3. Without com- 
puting the mass distribution, the orthogonalization 
process of the last section may be carried out by 
(15:5), (15:7) and (15:8) with Po=l, 6_i-=0. The 
vectors r^, pt corresponding to the polynominals 
Ri, Fi are therefore determined by the recurrence 
relations 

Pi=ri+bi^iPi,i, ri+i = ri—aiApi. (16:1) 

Multiplication by X in the domain of polynominals 
is mapped by our correspondence into applying A 
in the vector space according to (14:11). In fact, 

Pi=Pi{A)ro, Ti=Ri{A)n (i==0, 1, . . .,n-l). 

The numbers ttx, 6^ are computed by (15:7) and (15:8). 
Using the isometric properties described in theorems 
14:1 and 14:2, we find that 



ai= 



\ri\ 



i^Pi,Pi) 



bi-i= 



ri 



rz-^l 



The vectors r,- are orthogonal, and the pi are con- 
jugate; the latter result follows from (15:6). Hence 
the basic formulas and properties of the cg-method 
listed in sections 3 and 5 are established. It remains 
to prove that the method gives the exact solution 
after n steps. If we set Xi+-i=Xi+aiPi, the corre- 
sponding residual is r^+i as follows by induction: 

k—Axi+i=(k—Axi)—aiApi=ri—aiApi=riJ^i. 

For the last residual fn we have (i—0, 1, . . . .,7^—1) 

(^n/z) = (rn^iri)—an-i(Apn-i,ri) 

= Rn-iRidm — an-i Pn-iRi\dm 

= RnRidm = 0. 
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Our basic method reestablishes also the methods 
of C. Lanczos for computing the characteristic poly- 
nomial of a given matrix A. Indeed the polynomials 
Ri, computed by the recurrence relation (15:5), lead 
finally to the polynomial B,,{\), which, by the basic 
definition of the correspondence in section 14, is the 
characteristic polynomial of A, provided that Tq satis- 
fies the conditions given in theorem 14:3. It m.ay 
be remembered that orthogonal polynomials build a 
Sturmian sequence. Therefore, the polynomials 
7?o, 7?i, . . ., Rn build a Sturmian sequence for the 
eigenvalues of the given 7natrii A. 

Our correspondence allows us to translate every 
method or result in the vector-space into an anal- 
ogous method or result for polynomials and vice 
versa. Let us take as an example the smoothing 
process in section 7. It is easy to show that the 
vector Ti introduced in that section corresponds to a 
polynomial__/?z(X) characterized by the following 
property: Rii\) is the polynomial of degree i with 
i?^(0) = l having the least-square integral on (0,Z)- 
In other words, if r^ is given by (14:5), then 



alR,{\y+alR,(\,Y + 



. + a^i?i(X„)^=miiiimum. 



This result may be used to estimate a single eigen- 
value of A. In order to compute, for instance, the 
lowest eigenvalue Xi, we select r^ near to the corres- 
ponding eigenvector. The first term in the expan- 
sion being dominant, the smallest root of Ri{\) will 
be a good approximation of Xi, provided that i is not 
too small. Hence the last residual vanishes, being 
orthogonal to ro, ri, . . ., r^_t. It follows that Xn is 
the desired solution. 

17. Example, Legendre Polynomials 

Any known set of orthogonal ])olynomials yields 
an example of a cg-algorithm. Take, for instance, 
the Legendre polynomials. Adapted to the interval 
(0,1), they satisfy the recurrence relation 

i?,+>(X)=|±/ (l-2X)i?,.(X)-^ 7?.-,(X), 7?,(0) = 1. 

From (15:1) and (15:4) 



4i + 2, 



Cj=7 



i + 1' 



^-=ii+r ^''--'^ 



This gives the following result, let ^ be a symmetric 
matrix having the roots of the Legendre polynomial 
i?^(X) as eigenvalues, and let 

where 6,, . . ,, Cn are the normalized eigenvectors of 
A, and mi = o:?, m2 = o:2, . . ., m^=Q;^ are the 
weight-coefficients for the Gauss' mechanical quad- 
rature with respect to Rn. The cg-algorithm applied 
to AirQ yields the numbers a^, hi given by (17:1). 
Moreover, 



{ri,ri) = h^-lhi.2 



W^o/o) = 



2i + l 



{toITq) (i<n). 



Hence the residuals decrease during the alogrithm. 
It ma}^ be worth noting that the Rayleigh quotient 
of ri is 

{ri,Ari) ^ 1 ■ 6z-i^l 
Ir.f ai ai_^ 2 

All residual vectors have the same Rayleigh quotient. 
This shows that, unlike many other relaxation 
methods, the cg-process does not necessarily have 
the tendency of smoothing residuals. 

The Chebyshev polynomials yield an example 
where a^. hi are constant for ?>0. 

18. Continued Fractions 

Suppose that we have given a mass distribution of 
type (b) as described in section 14. The function 
m(X) is a step function with jumps at 0<^Xi<^X2<C. • . 
<X„</, the values of the jumps being mi, m,2, . . ., 
m„, respectively. It is well known ^^ that the orthog- 
onal polvnominals Rq{\), Ri(\), . . ., RnW, corre- 
sponding to this mass distribution, can be constructed 
by expanding the rational function 



F(\)- 



mi 



rrir 



X-Xi 



x-x« 



(18:1) 



in a continued fraction. The polynominal Ri{\) is 
the denominator of the -ith convergent. For our 
purposes it is convenient to write the continued 
fraction in the form 



^(X) = 



da—an\- 



-1 



6?i — aiX- 



C2 



(^2—^2^ — 



(18:2) 



^w-i 



dn-l dn-lX 



' n. S. Wall, Analytic Theory of Continued fractions, Van Norstrand (1948), 
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The denominators of the convergents are given by 
the recursion formulas 



Ri+i=(di~aiX)Ei — CiRi.i, Ro=l, 



--0. 



(18:3) 



This coincides with (15:1). However, in order to 
satisfy (14:3), the expansion must be carried out 
so that di = Ci-{-l, by virtue of (15:2). The numbers 
bi arc then given by (15:4). It is clear that 



F{\)- 



where 



^.-i(X) 



Q.n-iW = l^rn,U(\-\j). 



Let us translate these results into the r^-dimensional 
space given by our correspondence. As before, we 
construct a positive definite symmetric matrix A 
with eigenvalues Xi, . . ., X^. Let 6i, . . ., 6^ be 
corresponding eigenvectors of unit length and choose, 
as before, 



ro = aiei + 



+ anen, 0?i 



The eigenvalues are the reciprocals of the squares of 
the semiaxis of the (r^—1) -dimensional ellipsoid 
{x,Ax) = \. The hyperplane, (ro, a:)==0, cuts this 
ellipsoid in an (n — 2)-dimensional ellipsoid, £'^-2, the 
squares of whose semiaxis are given by the reciprocals 
of the zeros of the numerator ^/;^_i(X) of F{\). 

This follows from the fact that if Xq is a number- 
such that there is a vector Xqf^O orthogonal to fo 
having the property that {Axq,x) = \{xq,x) whenever 
(fo, .r) = 0, then Xq is the square of the reciprocal of 
the semiaxis of En_2 whose direction is given by ^^o- 
If the coordinate system is chosen so that the axes 
are given by ^i, . . ., 6^, respectively, then X==Xo 
satisfies the equation 



^.-i(X) = 



x,-x 





. . 





«! 





X2-X 


. . 





a-i 























^n- 



= 



ai 0:2 • • • an ys , 

as was to be proved. 

Let us call the zeros of Qn-iW the eigenvalues of A 
with respect to r^ and the polynomial Qn-\W the 
characteristic polynomial oj A with respect to r^. The 
rational function F{\) is accordingly the quotient of 
this polynomial and the characteristic polynomial of 
A , Hence we have , 

Theorem 18:1. The numbers a^-, hi connected with 
the cg-process of a matrix A and a vector Xq can be com- 



puted by expanding into a continued fraction the quo- 
tient built by the characteristic polynomial of A with 
respect to Tq and the ordinary characteristic polynomial 

This is the simplest form of the relation between a 
matrix A, a vector r^ and the numbers a^, bi of the 
corresponding cg-process. The theorem may be used 
to investigate the behavior of the a,, 6/ if the eigen- 
values of A and those with respect to Tq are given. 
The following special case is worth recording. If 
mi=m2= . . . =m„=l, the rational function is the 
logarithmic derivative of the characteristic poly- 
nomial. From theorem (18:1) follows 

Theorem 18:2. If the vector r^ of a cg-process is the 
sum of the normalized eigenvectors of A, the numbers 
ai, bi may be computed by expanding the logarithmic 
derivative of the characteristic polynomial of A into a 
continued fraction . 

Finalh", we are able to prove 

Theorem 18:3. There is no restriction whatever on 
the positive constants ai^ bi in the cg-process, that is, 
given two sequences of positive numbers ao, ai, . . ., 
an-i and bo, bi, . . ., bn-i, there is a symmetric positive 
definite matrix A and a vector r^ such that the cg- 
algorithm applied to A, r^ yield the given numbers. 

The demonstration goes along the following lines: 
From (15:2) and (15:4), we compute the numbers Ci, 
di, the Cj being again positive. Then we use the con- 
tinued fraction (18:2) to compute F{\) which we 
decompose into partial fractions to obtain (18:1). 
We show next that the numbers Xi, m^ appearing in 
(18:1) are positive. After this has been established, 
our correspondence finishes the proof. 

In order to prove that X,>0, m,>0 we observe that 
the ratio Ri^.i/Bi is a decreasing function of X, as can 
be seen from (18:3) by induction. Using this result, 
it is not too cUfficult to show that the polynomials 
i?o(X), i?i(X), . . ., i?n(X) build a Sturmian sequence 
in the following sense. The number of zeros of i?„(X) 
in any interval a< X< 6 is equal to the increase of the 
number of variations in sign in going from a to b. 
At the point Xq there are no variations in sign since 
i? .(0) = 1 for every i. At X= + 00 ^ there are exactly n 
variations because the coefficient of the highest power 
of X in RiW is (— l)'aoai . . . Ot^i. Therefore, the 
roots Xi, X2, . . ., X„ of RnM are real and positive. 

That the function F(\) is itself a decreasing func- 
tion of X follows directly from (18 : 2). Therefore, its 
residues mi, ^2, . . ., m„ are positive. 

In view of theorem. 18:3 the numbers at in a cg- 
process can increase as fast as desired. This result 
was used in section 8.2. Furthermore, the formula 

' In? 

shows that there is no restriction at all on the be- 
havior of the length of the residual vector during the 
cg-process. Hence, there are certainly examples 
where the residual vectors increase in length during 
the computation, as was stated earlier. This holds 
in spite of the fact that the error vector h—x decreases 
in length at each step. 
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19. Numerical Illustrations 

A number of numerical experiments have been 
made with the processes described in the preceding 
sections. A prehmuiary report on these experiments 
will be given in this section. In carrying out these 
experiments, no attempt was made to select those 
which favored tlie method. Normally, we selected 
those which might lead to difficulties. 

In carrying out these experiments three sets of 
formulas for a^•, bi were used in the symmetric case, 
namely, 



ai-- 



(Pi,ri) 
'{pi,Apiy 



h^=- 






(19:1) 



\ri\ 



{VuAv^) 



lr-|2 



ai = 



{pt,Api)di 



bi= 



\ri\' 



di, di=l — bi 



(19:2) 

"' (PuAVi) ' 
(19:3) 



In the nousymmctric case, we have used only the 
formulas 



ai= 



\A*r,\ 
\Ap,\'' 



hi = 



\A*r,, 



(19:4) 



Our experience thus far indicates that the best 
results are obtained by the use of (19:1). Formulas 
(19:2) were about as good as (19:1) except for very 
ill conditioned matrices. Most of our experiments 
were carried out with the use of (19:2) because they 
are somewhat simpk^r than (19:1). Formulas (19:3) 
were designed to improve the relations 



C'^z^^r+O^O, {pi,Api^x)=0, 



(19:5) 



which they did. Unfortunately, they disturbed the 
first of the relations 



{puri+i)=0, {j)i,Apt^{) = ^. 



(19:6) 



A reflection of the geometrical interpretation of the 
method will convince one that one should strive to 
satisfy the relations (19:6) rather than (19:5). It is 
for this reason that (19:1) appears to be considerably 
superior to (19:3). In place of (19:2), one can use 
the formidas 



ai = 



\ri\ 



{viAvii 



^^ Jri^-^^" ^^urd (19^20 



to correct rounding off errors. A preliminary 
experiment indicates that this choice is better than 
(19:2) and is perhaps as good as (19:1). 



A sufficient number of experiments have not been 
carried out as yet so as to determine the '^best" 
formulas to be used. Our experiments do indicate 
that floating operations should be used whenever 
possible. Wo have also observed that the results 
in the (n + l)st and 0^ + 2)n(l iterations are normally 
far superior to those obtained in the /ith iteration. 

Example 1. This (example was selected to ilhis- 
trate the method of conjugate gradients in case 
there are no rounding off errors. The matrix A 
was chosen to be the matrix 



1 2 

2 5 
-1 

1 2 



If we select k to ])e the vector (0,2, —1,1), the 
computation is simple. The results at each step 
are given in table 1. 

Nornnilly, the computation is not as simple as in- 
dicated in the preceding case. For example, if one 
selects the sohition h to be the vector (1,1,1,1), then 
k is the vector (3,9,5,6). The results with (0,0,0, 0) 
as the initial (estimate is given by table 2. 

Table 1. 



1 


1 





2 


6 








3 



Step 


Vector 


Components of the vector 


fli . 


bi-i 




2 


3 


4 





ro 


-1 








— 2 






1 






-1 




6 


1 


Xi 

Pi 
Api 






-6 





2 
2 




-I 
-1 






1 
1 

1 


6 . 


2 


X2 

r2 

P% 

Ap2 


-36 


-30 



+ 12 
2 
12 



-6 
-1 
-6 
-6 


6 
-5 


-G 


5/6 


5 


3 


Xz 

rz 

P3 

Apz 


-61 


-20 



22 

2 

10 

10 


-11 
4 

20 


G 





1/5 


2/3 


4 


2-4 


-65 


24 


-11 


^ 
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Table 2. 



step 


Vector 


a times componerLts of vector 


a 


1 


2 


3 


4 





ro 

po 

Apo 



3 
3 
22 



9 
9 
63 




5 
5 

27 




6 

6 

39 


1 
1 
1 
1 


1 


ri 

Pi 

Api 


453 

-316 

-1935 

-12854 


1359 

-495 

-2799 

-15585 


755 

933 

6461 

40701 


906 

123 

1140 

-4113 


iSi 
/Si 
/Si 71 
i8l71 


2 


Xi 
Ap2 


131702 

1689 

-116022 

-66471 


419553 

-34360 

-1684085 

-2579187 


298277 

-27345 

-381080 

-2140458 


304149 

73483 

3066641 

5685731 


/32 

/32 
/3272 
/8272 


3 


X3 

n 

pz 

Apz 


27589274 
542343 

41725242 
542343 


84526651 

-188185 

-15212135 

-188185 


62344884 

92550 

6969632 

92550 


73103513 
-66019 

-3788997 
-66019 


/83 
/S3 

/S373 
/3373 


4 


Xi 
Ti 


1 



1 



1 



1 



1 
1 



181 = 1002, /32 =326123, 183 = 69314516, 
7i=/3i/151, 72=ft/8149, 73=/33/899615 

00 = 1/71, «l = 7l/72, 02 = 72/73, 03 = 73 

6o=8149/^;, 6i=899615/3iTi//S|, 62=380689i3272//3^ 

Table 3. 







k 


Xi 


Ti 


12-11 


1 








3 


3 





2 5 2 





1 





9 





3 


-10 6 








1 


5 





8 


12 3 








1 


6 





3 


12-11 


1 








3 


-3 





1 2 


-2 


1 





3 


3 





2 5 1 


1 





1 


8 





2 


12 


-1 





1 


3 





3 


12-11 


1 








3 


7 





1 2 


-2 


1 





3 


-1 





11 


5 


-2 


1 


2 


2 





12 


-1 





1 


3 





1 


12-11 


1 








3 


1 





1 2 


-2 


1 





3 


1 





11 


5 


_2 


1 


2 


1 





1 


-6 


2 


-1 1 


1 


1 






The system just described is particularly well 
suited for elimination. In case k is the vector 
(3, 9, 5, 6) the procedure described in section 12 yields 
the results given in table 3. In this table, we start 
with the matrices A and 7. These matrices are 
transformed into the matrices P*A and P* given at 
the bottom of the table. 

It is of interest to compare the error vectors 
yi=h—Xi obtained by the two methods just described 
with k={3, 9, 5, 6). The error |?/^| is given in the 
following table. 



M 


cg-method 


Elimination method 


I2/0I 


2.0 


2.00 


I2/1I 


0.7 


2.65 


I2/2I 


.67 


4.69 


I2/3I 


.65 


6.48 


l?/4| 


.0 


0.00 



In the cg-method \yi\ decreases monotonically, while 
in the elimination method \yi\ increases except for 
the last step. 

Example 2. In this case the matrix A was chosen 
to be the matrix 



. 263879 


.014799 


. 016836 


. 079773 


-. 020052 


. 011463 


-.014799^ 


. 249379 


. 028764 


. 057757 


-.056648 


-. 134493 


. 016836 


. 028764 


. 263734 


-. 033628 


-.012128 


. 084932 


. 079773 


. 067757 


-. 033628 


. 215331 


. 090696 


-. 037489 


-. 020052 


-. 056648 


-. 012128 


. 090696 


. 324486 


-. 022484 


. 011463 


-. 134493 


. 084932 


-. 037489 


-. 022484 


. 339271 



This matrix is a well-conditioned matrix, its eigen- 
values lying on the range Xi = . 6035^X^X6=4. 7357. 
The computations were carried out on an IBM 
card programmed calculator with about seven sig- 
nificant figures. The results for the case in which 
Xo is the origin and h the vector (1,1,1,1,1,1) are 
given in table 4. 

Example 3. A good illustration of the effects of 
rounding can be obtained by study of an ill-con- 
ditioned system of three equations with three un- 
knowns, namely, the system 

6a;+132/-170-l 
13:r+292/-38s=2 
— 17a:-38i/ + 50s=-3, > 

whose solution is x'=l, ?/=— 3, ^=—2. The system 
was constructed by E. Stiefel. The eigenvalues of 
A are given by the set 

Xi=.0588, X2=.2007, X3=84.7405. 

The ratio of the largest to the smallest eigenvalue 
is very large: X3/Xi = 1441. The formulas (19:1), 
(19:2), and (19 : 3) were used to compute the solution, 
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Table 4. 

starting vector A = (3.371, 1.2996, 3.4851, 3.7244, 3.0387, 2.412) 



Step 


Xi 


r,- 


Pi 


flt, bi 







3. 37100 


3. 37100 









1. 29960 


1. 29960 


00 = 3.092387 








3. 48510 


3. 48510 







3. 72440 


3. 72440 


bo=0. 02360156 







3. 03870 


3. 03870 









2. 41200 


2. 41200 






1. 042444 


-0. 3176047 


-0. 02380454 






. 4018866 


1.011922 


. 1042594 


01=3.487517 


1 


1. 077728 


. 2194351 


. 03016873 




1. 151729 


-.02954774 


. 005835219 


6i=0. 1411714 




. 9396836 


-. 3199108 


-.02481941 






. 7458837 


.03016107 


. 008708692 






. 9594250 


-0. 009951160 


-0. 1331168 






.7654931 


. 004267497 


. 1898594 


02=5.448597 


2 


1. 1829418 


-.01781102 


-. 1355206 






1. 1720790 


-. 009187803 


-. 08364038 


62=0.3997728 




.8531255 


.01514192 


. 1163813 






. 7762554 


. 03244676 


. 3367617 






. 8868953 


. 1476560 


. 009443967 


r 




. 8689395 


. 1042268 


. 01801273 


03 = 4. 580482 


3 


1. 1091023 


-. 1643885 


-.02185659 






1. 1265069 


—.0091902 


-. 004262730 


63=0.3769145 




. 9165367 


. 0633472 


.01098733 






. 9597427 


-. 0907231 


. 004390484 


r 




. 930153 


. 08593514 


. 1215308 






. 951447 


. 00050406 


. 06839666 


04 = 5.464933 




-1.008989 


. 05108757 


-. 03129307 




4 


-1. 106982 


-. 12107954 


-. 1371464 






. 966864 


-.03445640 


. 006956427 


64=0.2541540 




. 979853 


.03607941 


. 05262778 






. 996569 


-. 002365634 


.007231114 






. 988825 


.000616167 


. 02354492 


05 = 4.742589 




. 991887 


. 002508661 


.017133.37 




5 


1. 032032 


. 003267702 


-.06753326 






. 970666 


. 00(5006834 


. 061836.34 


65 = 




1. 008614 


. 003155791 


-. 01818237 






. 999998 


-. 00000252 








. 999991 


-. 00000084 






6 


1. 000013 
1. 000004 
. 999992 
. 999991 


-. 00002271 
. 00000645 
. 00001 a36 
. 00000825 







keeping five significant figures at all times. For 
comparison, the computation was carried out also 
with 10 digits, using (19:2). The results are given 
in table 5. In the third iteration, formula (19:1) 
gave the better result. In the fourth iteration, 
formulas (19:1) and (19:2) were equally good, and 
superior to (19:3). The solution was also carried out 
by the elimination method using onl}' five significant 
figures. The results are 





cg-method (19:1) 


Elimination 


X 

y 

z 


. 99424 
-2.99518 
-1. 99328 


1. 00603 
-3.00506 
-2.00180 



Table 5 
a:o= (1,0,0) 



Case. 


1 

Formula (19:2) 


2 

Formula (19:3) 


3 

Formula (19:1) 


1 with 10 digits 




5 

11 

-14 


5 

11 

-14 


5 

11 

-14 


5 

11 

-14 


CO 


.011804 


.011804 


.011804 


. 01180409347 


Ji 


. 94098 

-. 12984 

. 16526 


. 94098 

-.12984 

. 16526 


.94098 

-. 12984 

. 16526 


. 9409795326 

-. 1298450282 

. 1652573086 


r\ 


. 14856 
. 18754 
.20021 


. 14856 
. 18754 
.20021 


.14856 
. 18754 
. 20021 


. 1485175838 
.1874503815 
.2003244444 


ri2 


. 097325 


. 097325 


.097325 


.09732500125 


60 


. 00028458 


. 00028458 


.00027639 


. 0002845760270 


Pi 


. 14998 
. 19007 
. 19623 


. 14998 
. 19007 
. 19623 


. 14994 
. 19058 
. 19634 


. 1499404639 
. 1905807178 
. 1963403800 


a\ 


7.0058 


7. 0393 


7.0059 


7. 006740263 


XI 


-. 10975 
-1.46564 
-1.20949 


-. 11477 
-1. 47202 
-1. 21606 


-. 10948 
-1.46502 
-1.21028 


-. 1096143529 
-1.4651946170 
-1.2104487372 


T2 


-. 15045 
. 030400 

. 085455 


-. 15188 
. 029648 
.084906 


-. 12747 
.081611 
. 018197 


-. 1275876043 
. 0814215368 
.0184025802 


ri^ 


. 030682 


.031156 


. 023240 


. 02324671838 


61 


.31710 


. 31860 


.23870 


. 2388565947 


VI 


-. 10289 
. 090861 
. 14768 


-. 10387 
. 090685 
. 14772 


-. 091679 
. 12710 
. 065039 


-.0917733357 
. 1269429981 
. 0652997748 


02 


. .047688 


. 047713 


12. 039 


12. 09069098 


x% 


-. 10484 
-1. 46997 
-1.21653 


-. 05079 
-1. 34651 
-1. 38837 


. 99424 
-2. 99518 
-1. 99328 


. 9999886893 
-3. 000023179 
-1.999968135 


r% 


-.057616 

. 23615 
-. 18543 


-. 058572 

. 23643 
-. 18733 


-. 086092 

-. 19036 

. 25063 


-. 0009108898 

-. 0020300857 

.0026663150 


n^ 


. 093471 


. 094422 


. 10646 


. 000012060204 


62 


3. 0287 


3. 0306 


4. 5804 


. 000518791676 


Vi 


-. 36924 
. 51134 
. 26185 


-. 37336 
. 51126 
. 26035 


-.50602 
. 39181 
. 54853 


-.0009585010 

-. 0019642287 

. 0027001920 


03 


2. 9923 


2. 9762 


. 011854 


.0118007358 


Xi 


1. 00004 
-3. 00005 
-2. 00006 


1.06040 
-2. 86812 
-2. 16322 


1. 00024 
-2. 99982 
-1. 99978 


1. 0000000003 
-2. 9999999997 
-1. 9999999993 


Tk 


. 00064408 
. 0014340 
-. 0018823 


. 00014843 

-. 00035647 

. 00094441 


.00005181 
. 0000152 
. 0000364 




. 00000000008 
. 00000000002 



435 



In this case the results hy the cg-method and ehmin- 
ation method appear to he equally effective. The 
cg-method has the advantage that an improvement 
can be made b}^ taking one additional step. 

This example is also a good illustration for the 
fact that the size of the residuals is not a reliable 
criterion for how close one is to the solution. Id 
step 3 the residuals in case 1 are smaller than those 
of case 3, although the estimate in case 1 is very far 
from the right solution^ whereas in case 3 we are 
close to it. 

Further examples. The largest system that has 
been solved by the cg-method is a linear, symmetric 
system of 106 difference equations. The computa- 
tion was done on the Zuse relay-computer at the 
Institute for Applied Mathematics in Zurich. The 
estimate obtained in the 90th step was of sufficient 
accuracy to be acceptable. ^^ 

15 See U. Hochstrasser, " Die Anwenduns der Methode der konjugierten Gradi- 
<?nten und ihrer Modifikationen auf die Losung linearer Randwertprobleme," 
Thesis E. T. H., Zurich, Switzerland, in manuscript. 



Several symmetric systems, some involving as 
many as twelve unknowns, have been solved on the 
IBM card programed calculator. In one case, 
where the ratio of the largest to the smallest eigen- 
value was 4.9, a satisfactory solution has been 
obtained already in the third step; in another case, 
where this ratio was 100, one had to carry out fifteen 
steps in order to get an estimate with six correct 
digits. In these computations floating operations 
were not used. At all times an attempt was made 
to keep six or seven significant figures. 

The cg-method has also been applied to the 
solution of small nonsymmetric systems on the 
SWAC. The results indicate that the method is 
very suitable for high speed machines. 

A report on these experiments is being prepared 
at the National Bureau of Standards, Los Angeles. 
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